ORLANDO, Fla. -- While “big data” and what SAS Institute Inc. calls “high-performance analytics” have emerged as the buzzwords at SAS’ Analytics 2011 Conference Series, attendees said a major analytics hurdle can still be summed up in two words: data quality.
The challenge is not a revelation. A 2010 survey from the U.K.-based Business Application Research Center, or BARC, found
“In large organizations, what I see is that you have various lines of business in an organization that collect data differently,” said Michael Click, a database marketing analyst for Atlanta-based SunTrust Banks Inc. “When they try to centralize it for that all-encompassing analysis … matching up feeds of data can be challenging.”
Click said he’s seeing a push for data centralization, which could help remove the data silos he and his co-worker Adam Lewis agree can become problematic.
“It allows you to get a single version of the truth,” said Lewis, also a database marketing analyst for SunTrust. “So you don’t have to spend time explaining why one set of numbers might be slightly different than others.”
In fact, Lewis said, if numbers from different groups don’t align at the beginning of a conversation, the discussion might as well be over.
“Even small differences can mean losing confidence,” said Click, agreeing. “That can become the story rather than what the numbers are telling you.”
Click and Lewis weren’t the only ones to place data quality at the top of the challenges list.
“One we always have and continue to have is data quality,” said Paulo Costa, director of advanced analytics for San Jose, Calif.-based Cisco Systems Inc. “You know the saying, ‘garbage in, garbage out.’ ”
For example, Cisco’s business-to-business market uses business data from the L.A.-based Dun & Bradstreet. While the data from Dun & Bradstreet is improving, work still needs to be done, Costa said.
“Data needs to be worthwhile,” he said, “and matching external and internal sources can be problematic.”
Problematic and time consuming. Costa estimates that 80% of an analyst’s time is spent on data preparation alone. “After you have clean data, that’s when the fun starts,” he said.
While organizations continue to face data quality concerns, the most celebrated vendor product releases are centered on the catchier, bigger buzz of big data.
Earlier this month, Microsoft announced a new partnership with Hortonworks, a Yahoo spinoff devoted to Hadoop development. Two weeks before that, Oracle announced its new NoSQL Database Enterprise Edition, a big data appliance meant to run on Hadoop.
The latest release came out of SAS’ Analytics 2011 Conference Series. Forgoing the Hadoop route, SAS plans to release a new platform for “high-performance analytics.” Partnering with Teradata and EMC Greenplum, the appliance offers a combination of in-database and in-memory analytics for higher, faster compute power without the need for heavy data lifting.
Not all businesses, though, rank data quality or even big data as an issue of serious concern these days. One attendee -- who asked to remain anonymous, citing his company’s policy -- said he recognizes bad data can be a problem, but one that will eventually be solved. A more pressing issue, he said, is preparing for how the banking industry will evolve as the economy turns around.
“A big question is how we reposition our risk analytics team when risk is no longer a key issue anymore,” he said.
As a member of that team for a regional bank headquartered in the Midwest that added staff after the financial collapse, he said the economy will bounce back eventually and that could mean financial institutions downgrade the important role that he and other risk analysts play.