You want to get started with a big data project at your company, but you're unfamiliar with Hadoop and you're unsure your project will deliver value. Relax. Many organizations are struggling to implement Hadoop for a variety of reasons. In "The Current State of Hadoop in the Enterprise," by the International Institute for Analytics and sponsored by SAS, you'll find a handy list of five steps to maximize the value of a Hadoop big data project for your organization. It's a great start. Here are some further considerations based on those recommendations:
1. Identify and define use cases that deliver competitive advantage and are strategic in nature.
First, choose your target. Let's say you want to study customer behavior. Your focus should be on new data types that are not currently being studied in other initiatives, such as an enterprise data warehouse. It's likely you will want to examine clickstream data, which tells you how customers are behaving online, and social media data, which tells you what people are saying about your brand.
Make sure your Hadoop project has a high profile and can deliver measurable value—such as more sales or repeat customers—fairly quickly. This will help justify your project and pave the way for future projects.
A good way to help identify and define use cases is the SAS Business Analytic Modernization Assessment (BAMA) service. Meant to help broaden the use of analytics in an organization, the BAMA is a workshop that facilitates conversation between IT and business units. Both sides work collaboratively to understand the key challenges with their current and future analytical processes.
2. Evaluate if and how Hadoop fits into your existing data and analytics architecture. For many organizations, business intelligence and analytics projects such as data warehouses have been going on for decades. Even though the data storage cost of Hadoop might be significantly less than your data warehouse, it's a mistake to scrap your warehouse investment for the sake of undertaking the same efforts in Hadoop. While Hadoop is ideal for storing things like sensor data, it's not so good for real-time processing of a small number of records. Analytics expert Tom Davenport says many companies are storing large quantities of new data types in Hadoop clusters, and then moving that data to an enterprise data warehouse as needed for production applications.1
Let's assume you have done an assessment and focused one of your Hadoop implementations on customer behavior. Next you'll need to evaluate where the data supporting that behavioral analysis lives. The cost of using a traditional data warehouse for storage of clickstream data can skyrocket. Hadoop can store large amounts of data at a reasonable cost, but that is not the end of the story. To achieve your organization's objective of better understanding customer behavior, you'll need powerful analytics able to exploit the customer clickstream data now stored in Hadoop clusters.
3. Augment Hadoop with data management, data discovery and analytics to deliver value.
Once you've established the need to use Hadoop for your largest, fastest-moving data, you'll need tools to manage, manipulate and analyze that data. But those tools must be able to keep pace.
Let's say you're storing sensor data in Hadoop. What are you doing with it? Alone it may not tell you much, but if you can join it with third-party data to build an analytics-based table, you may glean some valuable insights. This can pay dividends where mechanical devices are involved—for example, an analyst could predict where failures in an aircraft might occur so that maintenance can be performed to keep planes in the air, increasing revenue and saving money. That kind of bottom-line benefit is important for your project to be successful.
Streamlining your overall time to value will help you further realize the power of Hadoop. How can you do this? Be sure you can access and load your data—in Hadoop or elsewhere—as quickly as you need. Explore billions of rows of data in seconds, and work with your data inside Hadoop—without the need to move the data to a separate analytical platform. Ensuring high efficiency for your analytical process top-to-bottom is the key to delivering value from your Hadoop implementation.
4. Re-evaluate your data integration and data governance needs.
Remember that the results of your data analytics project may be used to determine major business strategies. Data integration and governance are as important as ever. You need to know where the data came from and that it's clean. Data governance takes it a step beyond technology to incorporate people and processes. Find a technology partner such as SAS that has years of experience bringing IT and business divisions together and helping to develop data standards suited to your particular organizational culture. Your data governance practices should enable you to have a high level of confidence that when the data is manipulated, the results will have value and they will be auditable.
5. Assess skills/talent gaps early and develop a plan to mitigate those gaps before deployment.
Big data is still a relatively new field, and the skills required to manage a project effectively can be surprisingly scarce. Productive use of Hadoop requires expertise in programing languages like Sqoop, Hive, Pig and MapReduce.
You should also determine whether or not a data scientist is needed to make sense of the big data project and connect it with your business's mission and strategy. It may be that a traditional business analyst can fill the need. For example, with an intuitive interface like the one included in SAS Data Loader for Hadoop, a user can acquire, discover, transform, cleanse, integrate and deliver data without being an expert in Sqoop, Hive, or Pig. But if you do hire a data scientist, it makes sense to allow him or her to focus on the tasks for which he or she is best equipped, such as modeling, rather than writing MapReduce. Ultimately, organizations that get the best results have a firm grasp of the skills needed—and come up with a plan to fill them—before they embark on a Hadoop project.
1“Three Big Benefits of Big Data Analytics,” SAS, 2014