Statistical Analysis in Business Intelligence and Data Warehousing

Organizations can provide and get value from statistical analysis without an in-house consultant with a PhD in statistics.

This article originally appeared on the BeyeNETWORK.

I recently came across an ESPN.com article by Greg Garber, who challenged whether certain “self-evident” concepts in the National Football League (and football in general) were really “true.” Some of these “truths” are concepts taught by football coaches across the country and touted by sports announcers every weekend from August until February. However, when Garber looked closer at statistical analysis of the data behind these “truths”; he found that they were usually not the “truths” that many coaches and broadcasters would have us believe.

Many readers of this article will probably think this is another of my “lies, damn lies, and statistics” rants. Instead, this will be a “truth, hard truth and statistical analysis” rant...

Is the glass half full or half empty?
I believe you can make anything look good or bad with a set of associated statistics or numbers. For example, did you know that Peyton Manning of the Indianapolis Colts is on a pace to throw 30% less touchdowns this year than last? Should Manning be faced with a 30% pay cut to match his falling production? Of course not! The Colts have already won 13 games this season and are considered one of the favourites to win the Super Bowl.

I also believe that you can gain insight with solid statistical analysis on those same statistics or numbers, particularly when you start including larger data sets. This is similar to the datasets available in your well established business intelligence and data warehousing organizations. These insights can help confirm assumptions used in business decisions or prevent false assumptions from being applied.

Don’t let the truth interfere with a good story…
In the spring of 1993, an NFL team’s marketing vice president proudly told me that he did not need a marketing research firm to tell him what or how to sell to his fans.  Those fans would simply tell him by coming to the games or not. Today, I believe that he knew what he was talking about. This man had over 10 years of experience, with his job, his product and his city. However; he was only dealing with a maximum of 700,000 sales events (game tickets) a year. This was also done on 10 individual Sundays over a five month period, in a limited geographic area. Cingular, for example, has more than 50 million nationwide subscribers across two separate product catalogs of Cingular and the old AT&T Wireless. Using the same “statistical” analysis as that of the NFL team's marketing vice president, would not be advisable for today’s average telecom service provider. While that does not mean that today’s telecom marketing vice president does not have a really good feel for marketing, it might be good to also refine/reinforce those decisions with valid statistical analysis.

Now, I am not advising that every marketing decision or business rule be vetted through statistical analysis using every bit of knowledge in the enterprise data warehouse. That is the very definition of “analysis paralysis.”

However, I recommend using the field of statistical analysis to:

  • Confirm and/or validate the “gut” feelings of many marketing vice presidents.
  • Provide the basis for rules engines that are becoming more popular in credit analysis applications for new customers or fraud analysis of existing customers.

This is where business intelligence and data warehousing organizations can provide value for various business areas. And this can all be done without a PhD in statistics.

The 80/20 Rule
As many of you know, the Pareto Rule is often called the “80/20 rule.” The Pareto Rule stresses that approximately 80% of something can be caused, affected or influenced by 20% of the effort exerted on that something. For example, approximately 80% of all network events (i.e. call detail records, IP detail records) come from a relatively small set of 20% of possible network event types—the most common types of events. The other 20% of traffic comes from the other 80% of possible events—the outliers.

Everything is 42
From the classic Douglas Adams’ books, the answer to question of life, the universe and everything is “42.” Essentially, this means that you must know what you are asking before you ask it. This is the key with statistical analysis tools. If you do not know what question you are asking, you probably will not understand the answers to the questions or you might use the answers incorrectly. When you are talking about millions of customers and their billions of interactions with your company, this can be an expensive proposition if you do not understand the data.

This is where those pesky consultants come in…once more. Again, I suggest you only “invite” them in with VERY specific objectives. These objectives should be to:

  • Answer specific questions.
  • Accomplish this in a specific timeframe.
  • Document the entire process before you write them a check/issue a purchase order.

If the consultants have done their job correctly and you have worked to learn from them, a business intelligence or data warehousing organization is now armed with the appropriate knowledge. This knowledge should continue to provide roughly 80% of the consultants’ value for about 20% of their cost.

New and Improved
With the current generation of database management systems and business intelligence toolsets, business intelligence and data warehousing organizations are now provided with statistical analysis tools. In the past, these tools were reserved for consultants with PhDs in statistics. But as I noted above, you might not get the answers you want if you do not understand the tools or what they are trying to accomplish. If your consultants have properly documented their processes, a statistics text book or online training should be enough.


Final Thoughts
Many data mining and statistics professionals will justifiably be offended by the nature this article. I am not trying to dislodge the data mining and statistics professionals from the telecom enterprise... Companies like Oracle, Microsoft and Business Objects are already trying to do that by including “canned” statistical analysis packages with their software.

Instead, I want to show that organizations without an in-house PhD in statistics can provide value to the organization. These organizations can also emphasize the importance of the data in their data warehouses, rather than having to pay consultants.

John MyersJohn Myers

John has more than 10 years of information technology and consulting experience in positions including business intelligence subject-matter expert, technical architect and systems integrator. Over the past eight years, he has gained a wealth of business and information technology consulting experience in the telecommunications industry. John specializes in business intelligence/data warehousing and systems integration solutions. John may be contacted by email at John.Myers@BlueBuffaloGroup.com.

Dig deeper on Business intelligence data mining

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

Close