freshidea - Fotolia
Even with the expanding array of advanced analytics tools now available for use, analytics teams still face plenty of challenges in developing big data applications and getting useful information from them.
For starters, building predictive models and machine learning applications is a difficult, time-consuming task that typically involves testing out a large number of data variables and algorithms before finding a combination that produces the desired analytics results. The degree of difficulty ratchets up further as the big data analytics process becomes more complex, such as in deep learning and artificial intelligence initiatives, noted Andy Feng, vice president in charge of Yahoo's big data and machine learning architecture.
"The innovation cycle is too long," Feng said. "You need to try a lot of different things, and that process can be complicated." Based in Sunnyvale, Calif., Yahoo runs more than 300 applications, including a growing set of machine learning programs, in a 40-cluster big data environment that's built around Hadoop and other technologies associated with the open source distributed processing framework.
Pharmaceuticals maker and healthcare services provider Merck & Co. Inc. is another big machine learning user -- for example, one application analyzes large amounts of diverse data collected from blood pressure monitors and other wearable devices as part of health management programs. Merck's machine learning platform "is really doing all the heavy lifting" in parsing and analyzing the data, said Murali Kaundinya, innovation engineering director at the Kenilworth, N.J., company.
However, to make using the platform less daunting for data analysts, the engineering team built an abstraction layer that hides the technical complexity of the algorithm development process from them. "There are way too many options -- you really want someone to curate the machine learning libraries and turn that into a platform or a service," Kaundinya said at Hadoop Summit 2016 in San Jose, Calif. "The idea is to dumb it down so [the analysts] can get their job done much faster."
Shared semantics ease analytics efforts
Macy's Inc. has taken a similar path, creating a virtual semantic data layer on top of its Hadoop data store. That gives the Cincinnati-based retailer's business intelligence and analytics teams a common framework to use in developing queries, reports and predictive models with data quality, consistency and governance checks built in upfront, said Seetha Chakrapany, director of marketing analytics and customer relationship management (CRM) systems at Macy's.
Seetha Chakrapanydirector of marketing analytics and CRM systems at Macy's Inc.
Before the semantic layer was put in place, the big data analytics process was getting bogged down in data engineering and preparation work, particularly as analysts started looking to run more complex queries, Chakrapany said. In addition, it was hard for analysts to collaborate on projects.
During a Hadoop Summit session, Chakrapany pointed to the relative immaturity of Hadoop and many of the open source data management and analytics technologies surrounding it as another issue that can hamper big data analytics applications.
"A lot of these tools are still not fully mature," he warned. "You need to accept the fact that there are instances where things are not going to be smooth." Chakrapany added, though, that the level of technical instability Macy's has experienced since isn't a showstopper "if you have an open mind and know that this [process] is for the greater good" overall.
Sold on improving access to data
At eBay Inc., figuring out how to make the results of analytics applications available to corporate executives and other business users in an easily accessible way was a six-year effort, involving a succession of steps that didn't fully fit the bill.
The online auction company generates 50 TB of new data for analysis daily, processing it in a combination of three back-end systems: a Hadoop cluster, a Teradata data warehouse and a custom warehouse jointly developed with Teradata. For analytics, eBay uses SAS, R, MicroStrategy, Tableau and other tools. More than 300 data analysts and 5,000 business users have access to the environment, said Alex Liang, director of data programs, products, architecture and strategy at eBay, which is based in San Jose.
Over the years, those people have created a lot of analytics information, including more than 10,000 reports in Tableau and 5,000 in MicroStrategy. The number of database tables containing user data sets has also topped the 10,000 mark. With so much data for users to plumb through, in different places, "it was almost impossible to find the right metrics in a report" to answer a specific business question, Liang said.
To try to remedy that, eBay first set up a wiki in 2009, aiming to foster more internal collaboration on analytics. It followed that with a data hub modeled on elements of Pinterest and Facebook, then tried other tacks, including a moderated analytics discussion forum. But the analytics platform was still disjointed and hard for users to navigate, according to Liang.
Finally, in 2014, eBay deployed a new hub application based on self-service data discovery, search and collaboration software from Alation Inc., with data governance capabilities also built in to further help users find information and ensure that it's trustworthy. Liang said the move replaced an IKEA-like model of do-it-yourself data assembly with a governed, self-service approach that's more user-friendly -- and more effective. Now, he added, the message to users is straightforward: "Go use analytics."
That's the same kind of mentality Macy's is trying to foster internally through its investments in big data management and analytics technologies. With the right tools tied to Hadoop and related data processing platforms, the big data analytics process can be a big contributor to better business decision-making in an organization, Chakrapany said. "You don't want to just see Hadoop as a cheap storage solution. Its value is much higher than that."
Big data analytics process requires balanced approach from analytics managers
The advantages of big data analytics tools
How real-time streaming can speed up big data analytics