rockyourdata

Datapalooza
This is a Datapalooza Crowdchat focused on identifying data application development best practices
   8 years ago
#rockyourdataDatapaloozaThis is a Datapalooza Crowdchat focused on identifying data application development best practices
IBM Analytics
Q #7 : What are the principal issues that prevent #datascientist from being effective modelers?
Anshuman Singh
Data munging!! Its pure evil!
Piyush Malik
Lack of business domain understanding and inability to sift through data challenges impact #DataScientist effectiveness
Zeydy Ortiz, PhD
To be effective, #DataScientists need to focus, first and foremost, on the biz problem - establish what is the question that needs to be answered with data, often overlooked @IBMbigdata
jameskobielus
I agree with Anshuman Singh: the data-munging (discovery, acquisition, preparation workload keeps modelers from devoting more time to exploring the data, building models, testing, iterating, and refining them.
Anshuman Singh
Unavailability of well labeled datasets!
Thalamus
"data science" means nothing. Everyone is a data scientist if they do science with data. Modelers are modelers, statisticians are statisticians, scientists are scientists. It's often wrongly assumed that their Venn diagram is one big circle
Adarsh Mohan
duration of effective modeling sessions, lack of ware of motives they are being asked about, identification of classifiers
IBM Analytics
Q #2 : What mix of skills and aptitudes makes a #datascientist proficient at modeling?
IBM Analytics
Please post your reply here
Zeydy Ortiz, PhD
Skills: analytical - gotta love the numbers @IBMbigdata
jameskobielus
Clearly, a data scientist must be good with numbers: stat, math, probabilities.But they should also be adept at critical thinking: distilling problems down to variables that might be measured and tested through statistical means.
Thalamus
curiosity and patience, but also an open-mindedness when it comes to methods and programming languages.
Lisa B
A data scientist needs to be a good programmer
John Furrier
this is a great question one that is always evolving and me thinks will continue to evolve.
Data4Decisions
A data scientist should be @IBMbigdata critical, curious and patient.
Anshuman Singh
The ability to answer creative,crucial,interesting,insightful questions, combined with sufficient technical and programming skills.
Thalamus
being able to tell when the results don't makes sense is pretty important when it comes to data science. You're the (human) adult in the room.
IBP
Important is to know about the business @IBMbigdata
Sharon Polsky
A healthy dose of skepticism, readiness to question the status quo, and sensitivity to the reality that it's not just "data", it's people whose lives will be affected by data scientists' decisions
Adarsh Mohan
Here are the areas with which data scientists must become familiar: linear algebra, basic statistics, linear and logistic regression, data mining, predictive modeling, cluster analysis, association rules, market-basket analysis, decision trees,
Praveen Anantharaman
Nible to get the business context and Solution
jameskobielus
A data scientist has to be skilled at "feature engineering"--ie., identify indep & dependent variables) to build into stat models. And they should have an agile mind for adjusting the feature mix as they build/test alternative models
John Furrier
Data wrangling either with writing some basic code or an interface - managing data is key
John Furrier
. @IBMbigdata i've predicted that data science will move to analysts and programming won't be required in the future (soon)
jameskobielus
A good data scientist should never love stat, math and data for its own sake, but rather as a means toward the end of ascertaining correlations and other patterns that it all may reveal.
Lisa B
A data scientist has to have an enquiring and creative mind and be highly driven
Anshuman Singh
Data Scientist= Data structures manipulator + Scientific degree holder
IBM Analytics
Time for question #3 - take a look at the top of your screen
John Furrier
@jameskobielus cofounder Jeff Hammerbacher of Cloudera had the best line comparing athlete to data scientist; love for the sport is what makes someone great
John Furrier
#techathlete in data science is like "gym rat" or "data rat" always hanging around never leaving the gym :-)
jameskobielus
I don't see question #3 at the top of my screen.
Zeydy Ortiz, PhD
We have seen a rapid move to self-serve analytics @furrier However, using those tools without the skills/knowledge is not very useful. Exhibit #1: Excel
IBM Analytics
Q #5 : What would be the ideal workbench for #datascience modeling?
IBM Analytics
Please post your replies here
Anshuman Singh
The bench used by Tony Stark!!
John Furrier
I love this topic
jameskobielus
The ideal #datascience modeling workbench should incorporate Spark for modeling, a deep statistical algorithm library (machine learning, NLP, etc.), support for R, Scala, and Python development, and access to a hybrid "data lake"
John Furrier
. @IBMbigdata workbenching notion is changing fast esp now that the required updates to data sets can be real time; API aggregation data wrangling.. a new CMS or tooling is needed big time- I live this problem every day
Zeydy Ortiz, PhD
My wishlist: team communication tools, shared and private repositories of data, source control for data and models, and while we are at it, with lots of memory and storage capabilities @IBMbigdata
jameskobielus
The ideal #datascience modeling workbench would automate most of the data discovery, acquisition, and preparation tasks, enabling modelers to focus on fast model development, testing, and deployment.
John Furrier
I see data integration plus WSWYG tool being really big oppty
Anshuman Singh
Other than Jarvis: Prototyping language(Python or R), ML library with distributed computing(@Spark), Visualizations(@Bokeh), IBM Watson!
Kunal Deep
It should be dynamic in nature and visual as much as it can. This ensures greater understanding of the approach.
Thalamus
R, Python, Spark are essential for sure, but I also find a good notebook and a pen are often forgotten tools by even the best modelers.
Zeydy Ortiz, PhD
.@hungry_thalamus ...and colored markers too!
IBM Analytics
Time for Question #6
John Furrier
Workbench solutions for big data I'm very interested in; if you get more info please send it to me
John Furrier
I'd love to do a workbench crowdchat topic - go deep in the weeds on how to
Adarsh Mohan
R, Scala, python, machine learning library and other deep statistical library such as NLP
IBM Analytics
Q #3 : What’s the best approach for identify and cultivating talented modelers?
IBM Analytics
Please post your reply here
IBM Analytics
waiting for your valuable inputs
Adarsh Mohan
A number of modelling approaches to handle high leverage observations or violation of assumptions. Models can be both parametric (Ex regression with Huber, White, Sandwich variance estimators) as well as non-parametric(e.g. quantile regression)
jameskobielus
Start from your current power analysts in various business units: marketing, ops, etc. There's probably a lot of statistical analysis talent, coupled with deep domain expertise, that you can nurture into high-powered data scientists
Lisa B
Look inside your company before recruiting from outside, The people within have already met certain criteria
Thalamus
in 2016, it's easy to spot good modelers with what they've already done. Cultivating them requires teamwork with developers, engineers, and people in the field gathering data. Also, they need the freedom to explore new methods.
jameskobielus
Explore various data-scientist engagement tactics--centers of excellence, online challenges, contests, hackathons, community events--to see what hot talent shows up and what type of work they can do.
Zeydy Ortiz, PhD
I highly recommend access to participate in Data4Good projects
Lisa B
Look for university graduates, they are eager to get started and have received the latest advice and training
jameskobielus
Establish internship programs in which smart people from a wide range of higher-educations institutions and programs can show what they're capable of. They'll bring passion, commitment, and fresh ideas to your data science practice
jameskobielus
Don't fall into the trap for thinking the best and brightest data scientists are those who come from the "top universities." Statistical smarts are probably distributed across the entire higher-education landscape.
Sharon Polsky
Short term: Look for candidates in non-traditional places, and follow the model of GHCQ’s recruitment puzzles.
Longer term: Let kids be kids. Allow them to fail so they develop resilience and self esteem. Nurture their curiosity and imagination so they grow to become creative, confident and engaged thinkers.
Zeydy Ortiz, PhD
In a corporate environment, I recommend job rotation
IBM Analytics
Time for question #4 - please look at the top of your screen for Question #4
Piyush Malik
Curious minds with aptitude for data and analytics especially young college hires given jobs shadowing