michelangelus - Fotolia

Cloud versions of Apache Spark offer simplicity

Spark Summit 2015 brought announcements that data scientists and other users will soon have two cloud-hosted options to run the big data framework. This podcast takes a deeper look.

The big news coming out of the Spark Summit 2015 conference in San Francisco was the general availability release of Databricks Inc.'s cloud-hosted implementation of the Apache Spark framework and big data processing engine.

In this edition of the Talking Data podcast, we discuss why the news attracted the attention of so many data engineers and data scientists and how it relates to the future of data processing.

There are some obvious benefits to running a big data framework like Spark in the cloud. Users can be sure they're running their jobs on the latest hardware, they don't have to invest in technology that will soon be obsolete and some of the more technically thorny aspects of managing a Spark cluster are taken care of for them by the vendor. Spark is sometimes thought of as a technology for data management professionals, but the availability of a fully managed implementation of Spark means data scientists and other data analysts can now dive in and start using it without having to rely too heavily on application developers.

This podcast also takes a look at an announcement by IBM that it is investing heavily in Spark. Similar to the Databricks news, IBM said it will be offering a hosted version of Spark on its cloud platform. It also promised to train more than 1 million data scientists and data engineers on Spark through partnerships with UC Berkeley's AMPLab -- where the technology was created -- as well as several training services companies and massively open online course websites.

Some industry watchers have postulated that the IBM announcement threatens to overshadow the release of Databricks Cloud, which would be ironic given that the Databricks team is led by the engineers who originally developed the Spark framework. Certainly, IBM brings immense resources to the concept of Spark processing. But the most significant aspect of its announcement may simply be that IBM has put its stamp of approval on the computing platform, signaling that it believes in the validity of Apache Spark going forward. Listen to the podcast to hear more analysis of the key developments at Spark Summit 2015.

Ed Burns is site editor of SearchBusinessAnalytics. Email him at [email protected] and follow him on Twitter: @EdBurnsTT.

Next Steps

Learn how the Apache Spark framework got so hyped up

Experts discuss why Spark may be the next big thing in analytics

Big data vendors line up behind the Spark engine

Dig Deeper on Big data analytics