sparkbizapps

Open Analytics & Data Science
Join us to engage with leading experts to discuss the power of open data science
IBM Analytics
Q2 : What role does Apache Spark play in open data science and analytics?
IBM Analytics
Please post your queries here
IBM Analytics
Waiting for your replies/comments
John Furrier
realtime is big benefit
IBM Analytics
Waiting for your replies/comments
John Furrier
in the moment analytics
Kirk Borne
Spark is part of the Apache open source ecosystem - so, anyone can commit updates to it. Hence, any improvements are shared with the whole ODS community
Zeydy Ortiz, PhD
There is huge value in an open source analytics platform - (1) portability of solutions
jameskobielus
Spark is the predominant open-source modeling tools for data science: machine learning, streaming analytics, graph analysis.
IBM Analytics
Tell us your view on the role of #ApacheSpark in DataScience
Zeydy Ortiz, PhD
Another value: easier collaboration (common framework)
Craig Brown, Ph.D.
Data Science uses Apache Spark to perform investigative analytics. Spark provides an interactive statistical environments to perform ad-hoc, exploratory analytics in order to gain insights and answer questions.
jameskobielus
Spark is, as Kirk states, part of the open Apache ecosystem of code for data science, big data, cognitive computing, etc,, including Hadoop, Kafka, and others.
Galvanize Aaron
Rapid prototyping!
Bob E. Hayes
Quick analytics helps you fail faster / learn quicker.
Keith Carlsen
Spark will emerge as the "operating system" for #analytics. Analytics needs parallelism for speed. Spark provides the basic infrastructure. #sparkbizapps
jameskobielus
Here's my recent IBM Big Data & Analytics Hub blog on Spark and R in the open analytics ecosystem. http://www.ibmbigdat...
John Furrier
spark enables data science to be relevant around what's happening now
Tamara McCleary
Open source is critical, especially in the world of data science. Spark offers real collaboration for everyone.
Kirk Borne
Jim K. is right on! Spark has great components: http://www.slideshar... http://image.slidesh...
Keith Carlsen
#Spark is beginning to understand what the #IBMz #mainframe has known for decades - resource management is critical to performance. #sparkbizapps
Kirk Borne
@furrier Hi John! Great to see you.
jameskobielus
Spark is the common development tool around which open data science communities can converge. Its open-source ubiquitious, and real-time/in-memory/interactive strengths, make it well-suited to embedded IoT analytics and other key apps.
Zeydy Ortiz, PhD
Spark in particular is making it easier to do the analysis w/o having to think about the mechanics of Map/Reduce
Kirk Borne
Some folks think Spark is in competition with Hadoop. Not true! See my "Lord of the Things" article: https://www.mapr.com...
Craig Brown, Ph.D.
As Apache Spark evolves via the Apache eco-system so does the toolset for Data Science. Data Science evolves as well innovating the statistical models produced.
Kirk Borne
@DrZeydy is exactly right -- Spark looks like a fighter jet next to Map/Reduce (a horseless carriage) https://www.mapr.com...
IBM Analytics
Time for Question #3 on the top of your screen
jameskobielus
@TamaraMcCleary Here's my recent blog on collaboration and workflow in open data science initiatives involving Spark and R. http://www.ibmbigdat...
8 Path Solutions
.@IBMbigdata Unlike other tools in the Hadoop family, which tend to focus on technology (i.e. queuing messages), Apache Spark is an important tool for open data science becasue data scientists can leverage HDFS, which is growing in popularity
Kirk Borne
At @BoozAllen, we are focusing on Agile Analytics, Fast Iterative Prototyping, MVP (Minimum Viable Product), etc. = #DataOps ...http://www.oreilly.c...
jameskobielus
@craigbrownphd I forgot to say "welcome," Craig. I always enjoy your contributions.
IBM Analytics
Q3 : What are the most important components in today’s open data science and analytics stack?
IBM Analytics
Please post your replies here
IBM Analytics
Gentle reminder to post your replies here
IBM Analytics
Waiting to hear from you on, Q3
Kirk Borne
Important components of Open #DataScience include the Apache ecosystem (Spark, Hadoop, Drill, etc.) plus R, Python, Jupyter, and microservices & containers (Docker...)
jameskobielus
Most important open-source DS language is R (with Python & Scala not far behind). Most important modeling tool is Spark. Most important open data refinemt platf: Hadoop. Most important open distrib pub/sub commit log platform: Kafka
Kirk Borne
All sorts of cool tools from Apache: Spark, Accumulo, Kafka, Drill, and Streaming Analytics = http://www.cakesolut...
Craig Brown, Ph.D.
Spark, R and Hadoop are essential. Most of the other components just enhance what can be delivered by #DataScience
IBM Analytics
Let us keep this great conversation going! .. Please chime in with your replies
jameskobielus
Most important open-source machine-learning libraries include SystemML, TensorFlow, and Mahout.
Dave Vellante
there should be some security components in the "stack"
Craig Brown, Ph.D.
Machine Leaning, to Data Science, is like the icing on the cake. You get a functionality out of the statistical model, bringing it to life
Dave Vellante
All this data dramatically expands the security threat - increases the stakes around IP
Kirk Borne
Excellent point @jameskobielus ! -- the various open source #MachineLearning libraries are great! http://bit.ly/1PMY20... #abdsc
Kirk Borne
@dvellante Hello Dave!! ... great to see so many great folks here!
Ali Khanafer
"best" depends on your application. #R is my favorite.
IBM Analytics
Time for question number 4 : Q#4
Dave Vellante
@KirkDBorne ditto Kirk - nice gathering of data zealots!
8 Path Solutions
The most important components will depend on the role of the data scientist. A data scientist working on product development will find python more valuable than R and APIs more useful than relational databases.
jameskobielus
Most important open-data sources? Hard to say, but here's an excellent recent list that Bernard Marr compiled: http://www.forbes.co...
Tamara McCleary
@dvellante Absolutely agree Dave. #Cybersecurity is critical w/ the exponential growth of #BigData #Cloud & #IoT devices. #sparkbizapps
Kirk Borne
@8pathsolutions Right!! Python for operational development (#DataProducts) and R for exploration, discovery, validation,...
Kirk Borne
Here is my big list of #OpenData repositories, benefits, and characteristics: http://rocketdatasci...
Cairo Qween
@dvellante I totally agree! Having all of this data is great, but we are so vulnerable now! It is a bit frightening when you think about it in depth.