This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
3. - The technology that backs the business plan: Read more in this section
- When planning a big data architecture consider these four factors
- Got Hadoop? Four factors to weigh prior to deployment
- AltEgo brings online avatars to life with in-memory database caching
Explore other sections in this guide:
By providing access to broader sets of information, big data can help maximize the analytical insights data analysts and business users generate. Successful big data analytics applications uncover trends and patterns that enable better decision making, point to new revenue opportunities and keep companies ahead of their business rivals. But first, organizations often need to enhance their existing IT infrastructure and data management processes to support the scale and complexity of a big data architecture.
Hadoop systems and NoSQL databases have become key tools for managing big data environments. In many cases, though, businesses are utilizing their existing data warehouse infrastructure, or a combination of the new and old technologies, to manage the big data flowing into their systems.
Whatever type of big data technology stack a company deploys, there are some common considerations that must be addressed to ensure it will provide an effective framework for big data analysis efforts. Before getting started on big data projects, it's crucial to look at the, er, bigger picture of the new data requirements they entail. Let's examine four of the considerations that need to be taken into account.
Data accuracy. Data quality issues are certainly no stranger to BI and data management professionals. Many BI and analytics teams struggle to ensure the validity of data and convince business users to trust in the accuracy and reliability of information assets. The widespread use of spreadsheets as personalized analytics repositories, or spreadmarts, can contribute to a lack of trust in data: The ability to store and manipulate analytics data in Excel creates an environment that supports self-service analysis capabilities but might not inspire other users to act confidently on the findings. Data warehouses, coupled with data integration and data quality tools, can help instill that confidence by providing standardized processes for managing BI and analytics data. But a big data implementation adds to the degree of difficulty due to increased data volumes and a wider variety of data types, particularly when a mix of structured and unstructured data is involved. Assessing data quality measures and upgrading them as needed to handle those larger and more varied data sets is vital to the successful implementation and usage of a big data analytics framework.
For more on planning a big data analytics infrastructure
Get tips on planning and implementing a big data integration strategy
Learn Gartner's rules for capitalizing on pools of big data
Evaluate big data infrastructures according to the factors that matter most
Storage fit. One of the core demands of data warehousing is the ability to process and store large data sets. But not all data warehouses are created equally in that regard. Some are optimized for complex query processing, while others aren't. And in many big data applications, the addition of unstructured data and the increased velocity at which data is created and collected compared to transactional systems makes augmenting a data warehouse with Hadoop or NoSQL technologies a necessity. For an organization looking to capture and analyze big data, storage capacity isn't enough; the important part is where the data is best put so it can be transformed into useful information and made available to data scientists and other users.
Query performance. Big data analytics depends on the ability to process and query complex data in a timely fashion. A good example is a company that developed a data warehouse to maintain data collected from energy usage meters. During product evaluations, one vendor's system was able to process 7 million records in 15 minutes, while another's topped out at 300,000 records in the same amount of time. Identifying the right infrastructure to support fast data availability and high-performance querying can make the difference between success and failure.
Scalability. With growing data volumes and variety in many organizations, a big data platform can't be built without the future in mind. It's imperative to think ahead and ask whether the big data technologies being evaluated can scale to the levels that will be required going forward. That extends beyond storage capacity to include performance as well, particularly in companies that are looking at data from social networks, sensors, system log files and other non-transactional sources as extensions of their business data.
Analyzing diverse and complex data sets requires a robust and resilient big data architecture. By considering these four factors when planning projects, organizations can determine whether what they already have in-house can handle the rigors of big data analytics applications or if additional software, hardware and data management processes are required to achieve their big data goals.
Lyndsay Wise is president and founder of WiseAnalytics, a research and analysis firm that focuses on business intelligence deployments at small and midsize businesses. Wise has more than 10 years of experience in business systems analysis, software selection and implementation of enterprise applications. Email her at firstname.lastname@example.org.