Predictive analytics is a form of advanced analytics that uses both new and historical data to forecast activity, behavior and trends. It involves applying statistical analysis techniques, analytical queries and automated machine learning algorithms to data sets to create predictive models that place a numerical value -- or score -- on the likelihood of a particular event happening.
Predictive analytics software applications use variables that can be measured and analyzed to predict the likely behavior of individuals, machinery or other entities. For example, an insurance company is likely to take into account potential driving safety variables, such as age, gender, location, type of vehicle and driving record, when pricing and issuing auto insurance policies.
Multiple variables are combined into a predictive model capable of assessing future probabilities with an acceptable level of reliability. The software relies heavily on advanced algorithms and methodologies, such as logistic regression models, time series analysis and decision trees.
Predictive analytics has grown in prominence alongside the emergence of big data systems. As enterprises have amassed larger and broader pools of data in Hadoop clusters and other big data platforms, they have created increased data mining opportunities to gain predictive insights. Heightened development and commercialization of machine learning tools by IT vendors has also helped expand predictive analytics capabilities.
Marketing, financial services and insurance companies have been notable adopters of predictive analytics, as have large search engine and online services providers. Predictive analytics is also commonly used in industries such as healthcare, retail and manufacturing.
Business applications for predictive analytics include targeting online advertisements, analyzing customer behavior to determine buying patterns, flagging potentially fraudulent financial transactions, identifying patients at risk of developing particular medical conditions and detecting impending parts failures in industrial equipment before they occur.
The predictive analytics process and techniques
Predictive analytics requires a high level of expertise with statistical methods and the ability to build predictive data models. As a result, it's typically the domain of data scientists, statisticians and other skilled data analysts. They're supported by data engineers, who help to gather relevant data and prepare it for analysis, and by software developers and business analysts, who help with data visualization, dashboards and reports.
Data scientists use predictive models to look for correlations between different data elements in website clickstream data, patient health records and other types of data sets. Once the data collection has occurred, a statistical model is formulated, trained and modified as needed to produce accurate results. The model is then run against the selected data to generate predictions. Full data sets are analyzed in some applications, but in others, analytics teams use data sampling to streamline the process. The data modeling is validated or revised as additional information becomes available.
The predictive analytics process isn't always linear, and correlations often present themselves where data scientists aren't looking. For that reason, some enterprises are filling data scientist positions by hiring people who have academic backgrounds in physics and other hard science disciplines. In keeping with the scientific method, these workers are comfortable going where the data leads them. Even if companies follow the more conventional path of hiring data scientists trained in math, statistics and computer science, having an open mind about data exploration is a key attribute for effective predictive analytics.
Once predictive modeling produces actionable results, the analytics team can share them with business executives, usually with the aid of dashboards and reports that present the information and highlight future business opportunities based on the findings. Functional models can also be built into operational applications and data products to provide real-time analytics capabilities, such as a recommendation engine on an online retail website that points customers to particular products based on their browsing activity and purchase choices.
Beyond data modeling, other techniques used by data scientists and experts engaging in predictive analytics may include:
- text analytics software to mine text-based content, such as Microsoft Word documents, email and social media posts;
- classification models that organize data into preset categories to make it easier to find and retrieve; and
- deep neural networking, which can emulate human learning and automate predictive analytics.
Applications of predictive analytics
Online marketing is one area in which predictive analytics has had a significant business impact. Retailers, marketing services providers and other organizations use predictive analytics tools to identify trends in the browsing history of a website visitor to personalize advertisements. Retailers also use customer analytics to drive more informed decisions about what types of products the retailer should stock.
Predictive maintenance is also emerging as a valuable application for manufacturers looking to monitor a piece of equipment for signs that it may be about to break down. As the internet of things (IoT) develops, manufacturers are attaching sensors to machinery on the factory floor and to mechatronic products, such as automobiles. Data from the sensors is used to forecast when maintenance and repair work should be done in order to prevent problems.
IoT also enables similar predictive analytics uses for monitoring oil and gas pipelines, drilling rigs, windmill farms and various other industrial IoT installations. Localized weather forecasts for farmers based partly on data collected from sensor-equipped weather data stations installed in farm fields is another IoT-driven predictive modeling application.
A wide range of tools is used in predictive modeling and analytics. IBM, Microsoft, SAS Institute and many other software vendors offer predictive analytics tools and related technologies supporting machine learning and deep learning applications.
In addition, open source software plays a big role in the predictive analytics market. The open source R analytics language is commonly used in predictive analytics applications, as are the Python and Scala programming languages. Several open source predictive analytics and machine learning platforms are also available, including a library of algorithms built into the Spark processing engine.
Analytics teams can use the base open source editions of R and other analytics languages or pay for the commercial versions offered by vendors such as Microsoft. The commercial tools can be expensive, but they come with technical support from the vendor, while users of pure open source releases must troubleshoot on their own or seek help through open source community support sites.