SEATTLE -- Growing up, I was almost painfully envious of my great-grandfather. He had worked on aeroplanes in the early days of uncertainty. He and his fellow pioneers weren't following the rules, they were forging them. A hundred years later, those of us privileged to work in BI are currently doing the same. Sure, in 20 years' time BI will be WUT (Well Understood Technology), but for now we get the fun of experimenting, trying different approaches and watching processes evolve.
At this week's BI conference, Microsoft announced new BI products that seemed to have the potential to fundamentally change the way we do BI. To find out more, I talked to Tom Casey, general manager for SQL Server Business Intelligence at Microsoft.
He said that the technologies are due for release by the first half of the calendar year 2010 in a release of SQL Server called SQL Server Kilimanjaro. He was at pains to point out that this is not the next major release of SQL Server [that's not expected until about 2011] - Kilimanjaro will be a minor release of SQL Server that is focused on a new set of BI capabilities, one of which is Madison.
What is Microsoft's Madison?
When Microsoft bought data warehouse appliance vendor DATAllegro I was, I admit, confused. I couldn't see how a data warehouse appliance fit into the Microsoft BI stack. What I hadn't realized was that the DATAllegro technology is highly modularized -- the scalability technology is essentially database engine-agnostic. Since acquiring DATAllegro, Microsoft has bolted that technology onto SQL Server. The result for the BI world is that if you are using SQL Server as your data warehouse engine, the realistic upper limit on data volume has just been raised from, say, 20 terabytes (TB) to 200 TB. Your mileage may vary, but we are still talking about an order of magnitude improvement. That's Madison.
What is Microsoft's Gemini?
Gemini is essentially an add-in to Excel that allows very large sets of data to be manipulated. Why Excel? Simply because it's a familiar environment for most people and the place where they expect to perform analysis.
But there's more. Not only can Gemini handle very large sets of data, but it can also allow data from disparate sources to be cross analyzed. So, for instance, you might pull in some data from your data warehouse and cross-correlate it with data from the Internet or with data you already hold in Excel.
Let's face it -- Excel doesn't have a great track record of being able to handle big sets of data. So Microsoft has added a new in-memory column store to handle the data. Can Gemini? Well, the demo I watched was run on a desktop with 8 GB RAM and a quad processor, and costs just less than $1,000. It was handling 100 million rows of data effectively instantaneously. So, I'd take that as a yes.
Clearly, Gemini is not aimed at the professional database (relational or multidimensional) developer -- it is aimed at the business user who needs to perform analysis. So there are no tools for explicitly allowing you to create, for example, a star schema. Instead, Gemini is built for end users and it will, for example, automatically infer relationships between the sets and join them behind the scenes (it also has mechanisms to help deal with nonmatching data).
So it's a fascinating technology -- but what is it for, how will it be used? The honest answer is that nobody currently knows and in five years' time it will either be a forgotten idea or in common usage. But the possibilities are intriguing.
New business intelligence software may enable new possibilities
Currently, business people are constrained by IT. If they want a cube for analysis, they have to work with the IT dudes to create it. Each modification takes time and effort. Gemini effectively allows business users to work rapidly and in a familiar Excel environment. In addition, whatever they create is in a sandbox. It doesn't fundamentally affect the production systems. The users can experiment and prototype until they are happy with the results.
Ah, but what then? Suddenly they are happy and want to roll out an Excel-based system to 5,000 users: That's going to work. Well, according to Casey, it will because the idea is that the system can be posted to SharePoint. At this point the IT guys can open it up and transfer the design (relatively easily) to a standard, OLAP-based solution that's as scalable as any other.
Now this is pre-beta code. Can I (or even Microsoft) guarantee that it will all work seamlessly and easily? Of course not. In fact, all of us know from experience that there will be problems, glitches and inconsistencies. Fine, that's the reality of the software world we know and love. But I still think this is a fascinating technology because it has the potential, not to allow us to analyze a bit faster or a bit more accurately, but to change the process. It will allow users to prototype on their own, in an intuitive way, without making the end result impossible to implement. If it can be made to work it has huge potential to change the very way in which we do BI. I think my great-grandfather would have loved it.
Microsoft BI Conference 2008 attendees positive
So far as I am concerned this conference, with its unashamed BI focus, is simply my favorite conference of the year -- an impressive achievement by the organizers after only two years -- and the quality of the speakers is excellent. But that's just my opinion, so as a reality check I spoke to some delegates to find out what they thought.
John Burch, director of management information at Tallahassee Community College in Florida, and Margaret Wingate, systems programmer there, both thought it a great conference. John came last year to explore the idea of BI, was enthused and has since completed a very successful project. This year he brought his entire team and they attended an impressive number of sessions, the whole conference experience being deemed thoroughly enjoyable. David Kaiser of Miami Dade College said he found the conference really helpful and that he'd have been happy for it to have been a couple of days longer. So it seems I am not alone. Make a note in your calendar for next year.
About the author: Dr. Mark Whitehorn specializes in the areas of data analysis, data modeling, data warehousing and business intelligence (BI). Based in the U.K., he works as a consultant for a number of national and international companies, designing databases and BI systems. In addition to his consultancy practice, he is a well-recognized commentator on the computer world, publishing articles, white papers and books. He has written nine books on database and BI technology. The first one "Inside Relational Databases" (1997) is now in its third edition and has been translated into three other languages. The most recent is about MDX (a language for manipulating multi-dimensional data structures) and was co-written with the original architect of the language – Mosha Pasumansky. Mark has also worked as an associate with QA-IQ since 2000. He developed the company's database analysis and design course as well as its data warehousing course.