freshidea - Fotolia
Half the work for data scientists and analysts is developing an analytical report or model that works. A good portion of the rest of their work is convincing people on the business side to actually use the data product.
But even if they do hit those first points, problems can still arise down the line. Failing to consider scalability of analytics architecture at an early stage can cause popular data tools to bog down and diminish their usability just as staff from the business side are adopting it.
This was the experience of Cox Automotive Media Group before implementing a Hadoop cluster three years ago. Speaking at the SAS Analytics Experience conference in Las Vegas last week, Shawn Hushman, vice president of decision sciences and valuations at the company, said Cox was using IBM Netezza as an analytical database -- but eventually it just couldn't keep up.
"We were running SQL queries that would run for three days and then fail," he said. The issue was that they were running out of space and processing power in the database, but the cost of the appliance made it unfeasible to add more. Basic data management processes were taking up all the compute resources, which left none for analytics queries.
Analysts shouldn't have to worry about data processing
Hushman and his team have since implemented a Hadoop cluster that handles all the data processing tasks of ingesting and storing data. A separate SAS server is used for analytics. This has freed up compute resource in the analytics architecture and allowed the company to do more advanced machine learning, such as real-time scoring of site visitors to determine whether they should be given a promotional offer. It's also helped allow more traditional data models, such as tools to measure visitor engagement or predict the value of vehicles, scale up.
"I don't want my team worrying about whether the data is being processed correctly or tracking down failures," Hushman said. "I want them thinking about how to make what we do better."
But even though tackling scalability at an early stage can make everything easier down the road, that doesn't mean it's always simple or easy. Jeff Parkinson, vice president of customer operations at Dow Jones, said that when he started thinking about how to modernize the organization's data infrastructure, there was pressure to deliver more tangible products first.
Make the benefits of infrastructure clear
Parkinson said data infrastructure is "invisible" to upper management at Dow Jones, which owns publications including The Wall Street Journal, Barron's and MarketWatch. This made it difficult to get funding when he wanted to move from a collection of seven legacy databases, including a mainframe with an "antiquated" data model, to a more modern, centralized cloud infrastructure. He found that most people wanted accessible data visualizations. It took some convincing to get management to understand that visualization tools should be the last thing added, and that setting up the data pipelines would lay the groundwork for future scalable success.
"We could do good [visualizations] once, paint a pretty picture for them, but without that core we couldn't do it again," he said.
Parkinson was eventually able to convince management to fund a project in Amazon Redshift. Dow Jones uses the database to ingest customer data into a single location, and SAS software to analyze it.
Customer service is the biggest beneficiary of the new analytics architecture. The team is now able to optimize customer interactions, modeling which customers are likely to respond to specific offers. Subscribers can now be seen by region and run models to know if it makes sense to continue daily delivery of newspapers or identify customers who are on trial subscriptions who might respond to marketing campaigns aimed at convincing them to become full subscribers.
Parkinson said the centralized database has allowed his team to put these tools in the hands of marketers and customer service departments. "We started in a very bad spot, and very quickly we were able to turn it around," he said.
New analytics demands new analytic architecture to support it
Preparing architecture can play a big role in succeeding with analytics
Don't overlook the importance of integration when setting up big data infrastructure