michelangelus - Fotolia

News Stay informed about the latest enterprise technology news and product updates.

Cloud versions of Apache Spark offer simplicity

Spark Summit 2015 brought announcements that data scientists and other users will soon have two cloud-hosted options to run the big data framework. This podcast takes a deeper look.

The big news coming out of the Spark Summit 2015 conference in San Francisco was the general availability release of Databricks Inc.'s cloud-hosted implementation of the Apache Spark framework and big data processing engine.

In this edition of the Talking Data podcast, we discuss why the news attracted the attention of so many data engineers and data scientists and how it relates to the future of data processing.

There are some obvious benefits to running a big data framework like Spark in the cloud. Users can be sure they're running their jobs on the latest hardware, they don't have to invest in technology that will soon be obsolete and some of the more technically thorny aspects of managing a Spark cluster are taken care of for them by the vendor. Spark is sometimes thought of as a technology for data management professionals, but the availability of a fully managed implementation of Spark means data scientists and other data analysts can now dive in and start using it without having to rely too heavily on application developers.

This podcast also takes a look at an announcement by IBM that it is investing heavily in Spark. Similar to the Databricks news, IBM said it will be offering a hosted version of Spark on its cloud platform. It also promised to train more than 1 million data scientists and data engineers on Spark through partnerships with UC Berkeley's AMPLab -- where the technology was created -- as well as several training services companies and massively open online course websites.

Some industry watchers have postulated that the IBM announcement threatens to overshadow the release of Databricks Cloud, which would be ironic given that the Databricks team is led by the engineers who originally developed the Spark framework. Certainly, IBM brings immense resources to the concept of Spark processing. But the most significant aspect of its announcement may simply be that IBM has put its stamp of approval on the computing platform, signaling that it believes in the validity of Apache Spark going forward. Listen to the podcast to hear more analysis of the key developments at Spark Summit 2015.

Ed Burns is site editor of SearchBusinessAnalytics. Email him at eburns@techtarget.com and follow him on Twitter: @EdBurnsTT.

Next Steps

Learn how the Apache Spark framework got so hyped up

Experts discuss why Spark may be the next big thing in analytics

Big data vendors line up behind the Spark engine

This was last published in June 2015

Dig Deeper on Big data analytics

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Do hosted versions of the Spark framework make you more likely to implement it?
Cancel

-ADS BY GOOGLE

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

SearchSalesforce

Close