Access your Pro+ Content below.
Big data initiatives get huge boost from new technologies
Big data initiatives can help companies improve operational efficiency, create new revenue and gain a competitive advantage. But traditional data processing often can't deal with the mountains of structured, semi-structured and unstructured data that needs to be mined for value. That leaves big data initiatives hungry for new tools and technologies to ease and speed data processing and predictive analytics functions.
In this e-book, get insight on useful tools for big data projects. The first chapter provides real-world examples of organizations using SQL-on-Hadoop engines to simplify the process of querying and analyzing Hadoop data. The second defines Spark -- including its capabilities and limitations -- and offers advice on deploying, managing and using the big data processing engine. And the third chapter focuses on using the open source R analytical programming language and commercial tools such as SAS and IBM SPSS to run analytical applications against Hadoop data sets.
CHAPTERS AVAILABLE FOR FREE ACCESS
Despite all the attention it gets, Hadoop's use as a framework for supporting big data processing has been limited to programmers with specific skills. Enter SQL, the standard programming language for relational databases. The integration of SQL tools is speeding Hadoop's performance and opening the door to more developers and data analysts who are well-versed in SQL. Users can choose from more than a dozen SQL-on-Hadoop open source and commercial tools, but most of these technologies are immature and have kinks to work out. And since many are specialized, it's important to understand their optimal uses.
In this e-book chapter, Executive Editor Craig Stedman presents real-world examples of several organizations using SQL-on-Hadoop engines to simplify the process of querying and analyzing Hadoop data. He also delves into the IT challenges companies are facing and how they're resolving them. These businesses range from healthcare analytics providers to marketers to auto insurers to online dating services. Despite the diverse array of SQL-on-Hadoop users, one general theme prevails: Integrating SQL tools with Hadoop has definitely rejuvenated the elephant. One technical architect sums up SQL's influence this way: "Really all [the developers] understood was SQL. … So we were able to develop a lot more, a lot faster, because they were using the SQL syntax they were familiar with."Download
A fire is catching in the world of big data processing. Since it was first introduced by The Apache Software Foundation a few years ago, the Spark processing engine has been moving throughout the big data ecosystem, latching onto users ripe for change the way a wildfire takes to some vegetation better than others.
A major appeal of Spark -- which is often paired with the Hadoop framework -- is its speed, especially compared to MapReduce, another Hadoop partner. Of course, the processing engine's youth also means it has areas in need of improvement. Some users struggle to stay on top of Spark updates because the other tools they pair it with don't have the latest version. But the benefits of Spark outweigh the inefficiencies for many users. "It's a stable [technology], and I have no hesitation at all about deploying it," said Peter Crossley, director of product architecture and technology at Webtrends Inc., one of the many users profiled in this e-book chapter written by Executive Editor Craig Stedman.
Webtrends was an early adopter of Spark and recently expanded the processing engine's role in its big data operations. Other users, like cloud software vendor Xactly Corp., are newer to the technology but are already seeing its benefits. One thing is for sure: Spark has caught the attention of companies that want to process information fast.Download
Big data environments based on technologies such as Hadoop and Spark are being deployed more widely -- and the same goes for advanced analytics tools that can help organizations make effective use of the data flooding into those systems. In fact, predictive analytics software was the top choice for planned business intelligence and analytics investments by respondents to a TechTarget survey.
And in many cases, deployments of advanced analytics software to support big data applications aren't a one-and-done thing. Macy's uses more than a half-dozen tools to meet different application needs as part of the retailer's big data analytics program. The technology roster includes statistical analysis, predictive modeling and machine learning tools that Macy's couldn't do without. "Because of the volume of data, there's just no humanly possible way to analyze it [manually]," said Seetha Chakrapany, the company's director of marketing analytics and customer relationship management systems.
Macy's is just one of six organizations featured in this e-book chapter by Executive Editor Craig Stedman. Progressive Casualty Insurance Co. is another. The insurer's data and analytics business leader, Pawan Divakarla, said the capabilities provided by advanced analytics tools are "huge" in enabling Progressive to manage a program for awarding discounts on auto insurance policies to safe drivers based on analysis of operational data collected from their vehicles.
But there are issues to contend with along the way, from the complexity of developing predictive models and machine learning algorithms to the challenge of sharing analytics results with business executives. Find out how Macy's, Progressive and others have overcome the hurdles and made advanced analytics against pools of big data work successfully.Download