Unstructured data is information, in many different forms, that doesn't hew to conventional data models and thus typically isn't a good fit for a mainstream relational database. Thanks to the emergence of alternative platforms for storing and managing such data, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications.Content Continues Below
Traditional structured data, such as the transaction data in financial systems and other business applications, conforms to a rigid format to ensure consistency in processing and analyzing it. Sets of unstructured data, on the other hand, can be maintained in formats that aren't uniform, freeing analytics teams to work with all of the available data without necessarily having to consolidate and standardize it first. That enables more comprehensive analyses than would otherwise be possible.
Types of unstructured data
One of the most common types of unstructured data is text. Unstructured text is generated and collected in a wide range of forms, including Word documents, email messages, PowerPoint presentations, survey responses, transcripts of call center interactions, and posts from blogs and social media sites.
Other types of unstructured data include images, audio and video files. Machine data is another category, one that's growing quickly in many organizations. For example, log files from websites, servers, networks and applications -- particularly mobile ones -- yield a trove of activity and performance data. In addition, companies increasingly capture and analyze data from sensors on manufacturing equipment and other internet of things (IoT) connected devices.
In some cases, such data may be considered to be semi-structured -- for example, if metadata tags are added to provide information and context about the content of the data. The line between unstructured and semi-structured data isn't absolute, though; some data management consultants contend that all data, even the unstructured kind, has some level of structure.
Unstructured data analytics
Because of its nature, unstructured data isn't suited to transaction processing applications, which are the province of structured data. Instead, it's primarily used for BI and analytics. One popular application is customer analytics. Retailers, manufacturers and other companies analyze unstructured data to improve customer relationship management processes and enable more-targeted marketing; they also do sentiment analysis to identify both positive and negative views of products, customer service and corporate entities, as expressed by customers on social networks and in other forums.
Predictive maintenance is an emerging analytics use case for unstructured data. For example, manufacturers can analyze sensor data to try to detect equipment failures before they occur in plant-floor systems or finished products in the field. Energy pipelines can also be monitored and checked for potential problems using unstructured data collected from IoT sensors.
Analyzing log data from IT systems highlights usage trends, identifies capacity limitations and pinpoints the cause of application errors, system crashes, performance bottlenecks and other issues. Unstructured data analytics also aids regulatory compliance efforts, particularly in helping organizations understand what corporate documents and records contain.
Unstructured data techniques and platforms
Analyst firms report that the vast majority of new data being generated is unstructured. In the past, that type of information often was locked away in siloed document management systems, individual manufacturing devices and the like -- making it what's known as dark data, unavailable for analysis.
But things changed with the development of big data platforms, primarily Hadoop clusters, NoSQL databases and the Amazon Simple Storage Service (S3). They provide the required infrastructure for processing, storing and managing large volumes of unstructured data without the imposition of a common data model and a single database schema, as in relational databases and data warehouses.
A variety of analytics techniques and tools are used to analyze unstructured data in big data environments. Text analytics tools look for patterns, keywords and sentiment in textual data; at a more advanced level, natural language processing technology is a form of artificial intelligence that seeks to understand meaning and context in text and human speech, increasingly with the aid of deep learning algorithms that use neural networks to analyze data. Other techniques that play roles in unstructured data analytics include data mining, machine learning and predictive analytics.
Learn how big data analytics tools use both structured and unstructured data
Learn more about Lexmark's Perceptive ECM tools