This is a two-part series on in-database analytics
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
- Understanding in-database analytics technology: Benefits, uses and ROI
- Is in-database analytics an emerging business intelligence (BI) trend?
In-database analytics is an emerging practice that experts say can significantly cut the cost and time it takes to do complex and data-intensive analytic processes.
The first part of this two-part series on database analytics will cover how the practice works and how it differs from more traditional analytical methodologies. We will also look at recent and developing market trends, such as vendor support and the emergence of open standards, as well as in-database analytics' potential for revolutionizing advanced analytics and business intelligence (BI). The second part of the series will discuss the paybacks of in-database analytics and how to realize them, as well as potential deployment challenges.
Breaking down in-database analytics
According to a November 2009 Forrester Research Inc. report, titled "In-Database Analytics: Heart of the Predictive Enterprise," the practice is far from bleeding edge. In fact, in-database analytics is the latest instance of a longstanding approach in which developers embed application logic into data warehouse and database systems.
In a traditional set up, predictive analytics, data mining and other compute-intensive analytic functions are part of separate applications or data marts, each typically with its own system, set of data, analytic tools and programmers. As a result, "a lot of people spend a lot of time shepherding data out of a database, profiling it, transforming into a format a particular analytic tool can digest, and moving it to where analysts can use it," says Neil Raden, president of Santa Barbara, Calif.-based consultancy Hired Brains Inc.
In contrast, with in-database analytics, predictive analysis, data mining and other analytic functions reside on the same centralized enterprise data warehouse (EDW). This eliminates I/O-intensive extract, transform and load (ETL) operations that can consume as much as 75% of cycle time in predictive analytics applications. It also enables developers to exploit powerful data warehouse platform technologies, such as parallel processing.
Empowering the enterprise data warehouse
In-database analytics is one of several recent developments that have made advanced analytics an increasingly important, and affordable, element of corporate BI initiatives.
First, a precipitous drop in storage, computing and memory prices has helped fuel the emergence of scaled down, low-cost database engines, data warehousing platforms and appliances.
Second, many of these platforms support leading-edge computing technologies that enable computing-intensive applications like advanced analytics to run more efficiently. For example, 64-bit memory enables large volumes of data needed for predictive analytics to reside in main memory instead of on disk, which eliminates time-consuming I/O transfers. Parallel processing enables multiple analytic processes to run in tandem. Virtualization enables companies to allocate computing resources to analytic and database querying functions on a prioritized and as-needed basis.
Another key factor is the recent emergence of two industry standards for advanced analytics. MapReduce, a vendor-neutral programmability framework for complex information types, is gaining traction among data warehouse and advanced analytics software vendors, according to Forrester.
The second standard, Hadoop, defines an open analytic processing pushdown workflow model and distributed analytic object-file store. It has growing support from database, data warehouse and cloud computing platform vendors, Forrester said.
Once these standards gain broad support from leading players, businesses will have far more flexibility in choosing (and migrating between) data warehousing platforms.
At least as important, MapReduce and Hadoop can work with unstructured as well as structured data residing in a database. This will be critical for the next generation of analytic applications, which will be mining the complex patterns in diverse and distributed information generated by Web 2.0 applications, social networking, clickstream analysis and the like, said Forrester analyst James Kobielus, lead author of the report.
Who can benefit from in-database analytics?
In-database analytics is potentially useful to many types of organizations pursuing advanced analytics. It's well-suited for activities such as targeted response marketing, dynamic pricing analysis and fraud detection and prevention, according to analysts. It can also help executives who need to know how best to allocate R&D money or funding for security upgrades; or who need to create a business plan that reacts to projected market changes over the next five years.
Still, it isn't for everybody, Raden warned. He recommended that companies considering whether to deploy in-database analytics, either as an in-house development platform or to support commercial applications, should ask themselves the following questions:
- Do you have any problems that lend themselves to predictive modeling?
- Is the necessary data readily available, consistent, accurate and supported?
- Above all, do you have the corporate culture and the will to make use of the results you obtain?
In other words: "Are you willing to let math algorithms in a computer lead you to do things you didn't conceive of yourself, and go against what you're already doing?" Raden asked. "A lot of companies aren't there yet."
Elisabeth Horwitt is a freelance writer.