The need to clarify the context of information is becoming vital as big data and the Internet of Things become ever more important sources in today’s biz-tech ecosystem.
Suddenly, it seems, it’s almost three months since my last blog entry. My apologies to readers: it’s been a busy time with consulting work, slide preparation for a number of upcoming events in Munich, Rome and Singapore over the coming weeks, and a revamp of my website with a cleaner, fresher look and a mobile friendly layout.
I pick up on a topic that’s close to my heart: the discovery and creation of context around information, triggered by last week’s BBBT appearance of a new startup, Alation, specializing in this very area. It’s a hot topic at present with a variety of new companies and acquisitions making the news over the past 6 to 12 months.
For a number of years now, the IT industry has been besotted with big data. The trend is set to continue as the Internet of Things offers an ever expanding set of bright, shiny, data-producing baubles. The increasing use of data, in real time and at high volumes is driving a biz-tech ecosystem where business value and competition depends entirely on the effective use of IT. What the press often misses—and many of the vendors and analysts too—is that such data is meaningless and, thus, close to useless unless its context can be determined or created. Some point to metadata as the solution. However, as I’ve explored at length in my book, “Business unIntelligence”, metadata is really too small a word to cover the topic. I prefer to call it context setting information (CSI), because it’s information rather than data, its role is simply to set the context of other information, and, ultimately, it is indistinguishable from business information—one man’s business information is another woman’s CSI. In order to describe the full extent of context setting information, I introduced m³, the modern meaning model, that relates information to knowledge and meaning, as shown above. A complete explanation of this model is beyond the scope of this blog, so let’s return to Alation and what’s interesting about the product.
Alation CEO, Satyen Sangani, @satyx, posed the question of what it means to be data literate. At a basic level, this is about knowing what a field means, what a table contains or how a column is calculated. Pressing a little further, questions about the source and currency of data, in essence its quality, arise. Social aspects of its use, such as how often it has been used and who uses it for what, complete the picture. Understanding this level of context about data is a vital prerequisite for its meaningful use within the business.
When dealing with externally sourced data, where precise meanings of fields or calculations of values are unreliable or unavailable, the social and quality aspects of CSI become particularly important. It is often pointed out that data scientists can spend up to 80% of their time “wrangling” big data (see my last blog on Trifacta). However, what is often missed is that this 80% may be repeated again and again by different data scientists at different times on the same data, because the results of prior thinking and analysis are not easily available for reuse. To address this, Alation goes beyond gathering metadata like schemas and comments from databases and data stores to analyzing documentation from wikis to source code, gathering query and usage data, and linking it all to the identity of people who have created or used the data. Making this CSI available in a collaborative fashion to analysts, stewards and IT enables use cases from discovery and analytics to data optimization and governance.
This broad market is red-hot at the moment and rightly so. Big data and the Internet of Things demand a level of context setting previously unheard of. I’ve previously mentioned products in this space, such as Waterline Data Science and Teradata Loom. A challenge they all face is how to define a market that does not carry the baggage of old failed or difficult initiatives such as metadata management, data governance or information quality. Don’t get me wrong, these are all vital initiatives; they have just received very bad press over the years. In addition, there is a strong need to move from perceived IT-centric approaches to something much more business driven. Might I suggest context setting information as a convenient and clarifying category?