This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
5. - Glossary of Hadoop-related terminology: Read more in this section
Explore other sections in this guide:
- 1. - Elucidating benefits, myths and facts about Hadoop
- 2. - Keeping up with Hadoop news and trends
- 3. - Examining issues and weaknesses in the Hadoop ecosystem
Unstructured data is a generic label for describing any corporate information that is not in a database. Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files.
If left unmanaged, the sheer volume of unstructured data that’s generated each year within an enterprise can be costly in terms of storage. Unmanaged data can also pose a liability if information cannot be located in the event of a compliance audit or lawsuit. The information contained in unstructured data is not always easy to locate. It requires that data in both electronic and hard copy documents and other media be scanned so a search application can parse out concepts based on words used in specific contexts. This is called semantic search. It is also referred to as enterprise search.
In customer-facing businesses, the information contained in unstructured data can be analyzed to improve customer relationship management and relationship marketing. As social media applications like Twitter and Facebook go mainstream, the growth of unstructured data is expected to far outpace the growth of structured data. According to the "IDC Enterprise Disk Storage Consumption Model" report released in Fall 2009, while transactional data is projected to grow at a compound annual growth rate (CAGR) of 21.8%, it's far outpaced by a 61.7% CAGR prediction for unstructured data.