Apache Spark is often thought of as a tool for the data engineers in the back rooms. After all, its sweet spot generally is considered to be processing data, and it's closely associated with Hadoop, which can make it appealing for data management activities.
But evidence shows data scientists are increasingly embracing the tool for their own work. A new survey from the cloud-hosted Spark vendor Databricks Inc., whose founders include some of the programmers who developed the original open source Spark analytics engine, shows the technology is being used to build things like business intelligence reports, recommendation engines and other traditional domains of data scientists.
Advanced analytic functions, such as Spark's library of machine learning algorithms, were cited as the top reason for implementing Spark. And while the developer-centric programming language Scala saw a reduction in Spark users, the more general purpose and analyst-friendly Python was the fastest-growing language used in the Spark environment.
Take a listen to this podcast to hear more about how the Spark analytics computing framework is gaining favor outside of the data management community. We also delve deeper into Python to see why it is becoming so popular among data scientists. A recent SearchBusinessAnalytics article looked at how The New York Times is using Python because it tends to overlap with general computing skills. This trend means that analysts who know Python, once they're done with a data analysis, can build a data product like an app or Web portal. Python might not be as stats-specific as the R programming language, but it does offer some similar functionality in the analysis realm.
Apache Spark unleashes rapid analytics projects
Spark's libraries make it an analytics powerhouse
Big vendors are going all-in on Spark