As Lenovo built up its internal analytics architecture during the past four years, the PC maker tapped the Amazon Web Services cloud for basic data warehousing and business intelligence applications. However, for higher-level predictive analytics, it kept things on premises -- until now.
In an effort to speed up the analysis of marketing and internet clickstream data, Lenovo plans to move much of its advanced analytics work to the cloud as well. The expanded cloud data analytics strategy will add real-time processing capabilities that aren't supported by the company's on-premises Hadoop cluster, which was designed to handle batch jobs for data ingestion and analysis.
The ultimate goal is to enable more targeted online advertising and email marketing that improves both the customer experience and Lenovo's bottom-line results, said Marc Gallman, its senior manager of big data. Gallman, who works at Lenovo's U.S. headquarters in Morrisville, N.C., heads an analytics team that also is responsible for deploying and managing the company's big data architecture.
It can take up to five hours to get clickstream records and other relevant data into the on-premises cluster for analysis, Gallman said. That amount of latency hampers the effectiveness of Lenovo's follow-up advertising aimed at prospective buyers who have visited its website.
Too much advertising information
"Customers are seeing more ads than necessary," Gallman said, noting that the company's advertising costs are higher than they optimally should be as a result. In addition, Lenovo runs the risk of irritating people by hitting them with multiple online ads, including ones that might not be relevant to them.
Via the cloud-based approach, Gallman hopes to reduce the processing latency to an hour or less so that yes or no decisions on ad placement can be made on a much timelier basis -- and with greater precision.
The cloud data analytics push will increase Lenovo's use of the Amazon Web Services (AWS) platform, which has primarily involved the Amazon Redshift data warehouse and Amazon Simple Storage Service (S3) up to this point. Gallman said his team has been looking at various AWS technology options in planning the cloud expansion, including the Amazon Athena query service, the Amazon Machine Learning analytics engine and the Amazon Elastic MapReduce big data platform built around Hadoop and related technologies.
Lenovo's analytics architecture, called LUCI Sky (an acronym standing for Lenovo Unified Customer Intelligence), was designed as a hybrid cloud and on-premises environment in 2013. But the business need for faster data analysis offers an increased opportunity to take advantage of the cloud for analytics, Gallman said, pointing to a combination of expected cost, flexibility and reliability benefits.
"You can stand up a small, persistent cluster and have it be dynamic as the workload increases," he explained.
Gallman also said that it's easier to incorporate a variety of tools into cloud analytics applications to get the best technology fit, and that he can better "sleep at night when things run in the cloud" because of the high availability promised by platform vendors like AWS.
Not all clear in the cloud
There are some issues to contend with, though. Choosing between the available AWS technologies for storing, processing and analyzing data in the cloud isn't a simple process, according to Gallman.
"You have to be skilled in your team to optimize the cloud setup," he said, cautioning that making the wrong choices could wipe out the anticipated cost savings compared to on-premises deployments.
Also, he said that, for security purposes, the big data team doesn't store any personally identifiable information about customers in the cloud. And Gallman doesn't plan to move all of the team's applications to the cloud. The on-premises cluster, based on Hortonworks' Hadoop distribution, will continue to be used as a development and test platform. Lenovo, which uses integration tools from Talend to pull data into the analytics architecture, will likely also continue to run some production jobs that don't require rapid results in batch mode on the cluster.
Mark Eatonenterprise architect at Autodesk
Cloud data analytics is also on the rise at Autodesk Inc., a developer of design and engineering software based in San Rafael, Calif. Autodesk has built an AWS-based big data architecture that stores analytics data in S3 for processing primarily with the Apache Spark engine, said Mark Eaton, the company's enterprise architect. Data analysts and business users can then access the S3 data, as well as BI dashboards and reports, via a data virtualization layer created with tools from Denodo Technologies.
Eaton said the cloud-based architecture, dubbed the Autodesk Data Platform, currently includes data on things such as the software subscriptions held by customers and their use of the company's web-based products. He cited the same kind of benefits from using the cloud for analytics that Gallman did: lower costs, easier scaling of systems and increased agility in adopting new technologies.
Autodesk is even linking its on-premises data centers directly to the AWS ones it uses to enable sub-millisecond latency between them, according to Eaton.
"We're in bed with AWS and looking to spend more time there with them," he said. "The question has changed internally from 'why cloud?' to 'why not cloud?'"
Online marketing company takes to the cloud for analyzing ad data
Beachbody taps date lake on AWS to pump up its analytics architecture
Hadoop vendors look to ease deployments of cloud-based big data systems