IBM Analytics59
Q2 : What role does Apache Spark play in open data science and analytics?
IBM Analytics
Please post your queries here
IBM Analytics
Waiting for your replies/comments
John Furrier
realtime is big benefit
IBM Analytics
Waiting for your replies/comments
John Furrier
in the moment analytics
Kirk Borne
Spark is part of the Apache open source ecosystem - so, anyone can commit updates to it. Hence, any improvements are shared with the whole ODS community
Zeydy Ortiz, PhD
There is huge value in an open source analytics platform - (1) portability of solutions
jameskobielus
Spark is the predominant open-source modeling tools for data science: machine learning, streaming analytics, graph analysis.
IBM Analytics
Tell us your view on the role of #ApacheSpark in DataScience
Zeydy Ortiz, PhD
Another value: easier collaboration (common framework)
Craig Brown, Ph.D.
Data Science uses Apache Spark to perform investigative analytics. Spark provides an interactive statistical environments to perform ad-hoc, exploratory analytics in order to gain insights and answer questions.
jameskobielus
Spark is, as Kirk states, part of the open Apache ecosystem of code for data science, big data, cognitive computing, etc,, including Hadoop, Kafka, and others.
Galvanize Aaron
Rapid prototyping!
Bob E. Hayes
Quick analytics helps you fail faster / learn quicker.
Keith Carlsen
Spark will emerge as the "operating system" for #analytics. Analytics needs parallelism for speed. Spark provides the basic infrastructure. #sparkbizapps
jameskobielus
Here's my recent IBM Big Data & Analytics Hub blog on Spark and R in the open analytics ecosystem. http://www.ibmbigdat...
John Furrier
spark enables data science to be relevant around what's happening now
Tamara McCleary
Open source is critical, especially in the world of data science. Spark offers real collaboration for everyone.
Kirk Borne
Jim K. is right on! Spark has great components: http://www.slideshar... http://image.slidesh...
Keith Carlsen
#Spark is beginning to understand what the #IBMz #mainframe has known for decades - resource management is critical to performance. #sparkbizapps
Kirk Borne
@furrier Hi John! Great to see you.
Tamara McCleary
@furrier Great to see you John. #sparkbizapps
jameskobielus
Spark is the common development tool around which open data science communities can converge. Its open-source ubiquitious, and real-time/in-memory/interactive strengths, make it well-suited to embedded IoT analytics and other key apps.
Zeydy Ortiz, PhD
Spark in particular is making it easier to do the analysis w/o having to think about the mechanics of Map/Reduce
Kirk Borne
Some folks think Spark is in competition with Hadoop. Not true! See my "Lord of the Things" article: https://www.mapr.com...
Craig Brown, Ph.D.
As Apache Spark evolves via the Apache eco-system so does the toolset for Data Science. Data Science evolves as well innovating the statistical models produced.
Kirk Borne
@DrZeydy is exactly right -- Spark looks like a fighter jet next to Map/Reduce (a horseless carriage) https://www.mapr.com...
IBM Analytics
Time for Question #3 on the top of your screen
jameskobielus
@TamaraMcCleary Here's my recent blog on collaboration and workflow in open data science initiatives involving Spark and R. http://www.ibmbigdat...
8 Path Solutions
.@IBMbigdata Unlike other tools in the Hadoop family, which tend to focus on technology (i.e. queuing messages), Apache Spark is an important tool for open data science becasue data scientists can leverage HDFS, which is growing in popularity
Kirk Borne
At @BoozAllen, we are focusing on Agile Analytics, Fast Iterative Prototyping, MVP (Minimum Viable Product), etc. = #DataOps ...http://www.oreilly.c...
jameskobielus
@craigbrownphd I forgot to say "welcome," Craig. I always enjoy your contributions.