Spark analytics libraries draw in data scientists

In this podcast, find out why Apache Spark isn't just for data engineers anymore. Data scientists are finding the Spark big data framework useful for analytics activities.

Apache Spark is often thought of as a tool for the data engineers in the back rooms. After all, its sweet spot generally is considered to be processing data, and it's closely associated with Hadoop, which can make it appealing for data management activities.

But evidence shows data scientists are increasingly embracing the tool for their own work. A new survey from the cloud-hosted Spark vendor Databricks Inc., whose founders include some of the programmers who developed the original open source Spark analytics engine, shows the technology is being used to build things like business intelligence reports, recommendation engines and other traditional domains of data scientists.

Advanced analytic functions, such as Spark's library of machine learning algorithms, were cited as the top reason for implementing Spark. And while the developer-centric programming language Scala saw a reduction in Spark users, the more general purpose and analyst-friendly Python was the fastest-growing language used in the Spark environment.

Take a listen to this podcast to hear more about how the Spark analytics computing framework is gaining favor outside of the data management community. We also delve deeper into Python to see why it is becoming so popular among data scientists. A recent SearchBusinessAnalytics article looked at how The New York Times is using Python because it tends to overlap with general computing skills. This trend means that analysts who know Python, once they're done with a data analysis, can build a data product like an app or Web portal. Python might not be as stats-specific as the R programming language, but it does offer some similar functionality in the analysis realm.

Ed Burns is site editor of SearchBusinessAnalytics. Email him at [email protected] and follow him on Twitter: @EdBurnsTT.

Next Steps

Apache Spark unleashes rapid analytics projects

Spark's libraries make it an analytics powerhouse

Big vendors are going all-in on Spark

Dig Deeper on Big data analytics