Predictive analytics is a form of advanced analytics that uses both new and historical data to forecast future activity, behavior and trends. It involves applying statistical analysis techniques, analytical queries and automated machine learning algorithms to data sets to create predictive models that place a numerical value, or score, on the likelihood of a particular event happening.
Predictive analytics software applications use variables that can be measured and analyzed to predict likely behavior by individuals, machinery or other entities. For example, an insurance company is likely to take into account potential driving safety variables such as age, gender, location, type of vehicle and driving record when pricing and issuing auto insurance policies. Multiple variables are combined into a predictive model capable of assessing future probabilities with an acceptable level of reliability. The software relies heavily on advanced algorithms and methodologies such as logistic regressions, time series analysis and decision trees.
Predictive analytics has grown in prominence alongside the emergence of big data systems. As enterprises have amassed larger and broader pools of data in Hadoop clusters and other big data platforms, it has created increased opportunities for them to mine that data for predictive insights. Heightened development and commercialization of machine learning tools by IT vendors has also helped expand predictive analytics capabilities.
Marketing, financial services and insurance companies have been notable adopters of predictive analytics, as have large search engine and online services providers. Predictive analytics is also commonly used in industries such as healthcare, retail and manufacturing. Business applications for predictive analytics include targeting online advertisements, flagging potentially fraudulent financial transactions, identifying patients at risk of developing particular medical conditions and detecting impending parts failures in industrial equipment before they occur.
The predictive analytics process
Predictive analytics requires a high level of expertise with statistical methods and the ability to build predictive data models. As a result, it's typically the domain of data scientists, statisticians and other skilled data analysts. They're supported by data engineers, who help to gather relevant data and prepare it for analysis, and by software developers and business analysts, who help with data visualization, dashboards and reports.
Data scientists use predictive models to look for correlations between different data elements in website clickstream data, patient health records and other types of data sets. Once the data to be analyzed is collected, a statistical model is formulated, trained and modified as needed to produce accurate results; the model is then run against the selected data to generate predictions. Full data sets are analyzed in some applications, but in others, analytics teams use data sampling to streamline the process. The predictive model is validated or revised as additional data becomes available.
The predictive analytics process isn't always linear, and correlations often present themselves where data scientists aren't looking. For that reason, some enterprises are filling data scientist positions by hiring people who have academic backgrounds in physics and other hard science disciplines and, in keeping with the scientific method, are comfortable going where the data leads them. Even if companies follow the more conventional path of hiring data scientists trained in math, statistics and computer science, an open mind on data exploration is a key attribute to have for effective predictive analytics.
Once predictive modeling produces actionable results, the analytics team shares them with business executives, usually with the aid of dashboards and reports that present the information and highlight future business opportunities based on the findings. Functional models can also be built into operational applications and data products to provide real-time analytics capabilities, such as a recommendation engine on an online retail website that points customers to particular products based on their browsing activity and purchase choices.
Applications of predictive analytics
Online marketing is one area in which predictive analytics has had a significant business impact. Retailers, marketing services providers and other organizations use predictive analytics tools to identify trends in the browsing history of a website visitor to personalize advertisements. Retailers also use customer analytics to drive more informed decisions about what types of products the retailer should stock.
Predictive maintenance is emerging as a valuable application for manufacturers looking to monitor a piece of equipment for signs that it may be about to break down. As the internet of things (IoT) develops, manufacturers are attaching sensors to machinery on the factory floor and to mechatronic products, such as automobiles. Data from the sensors is used to forecast when maintenance and repair work should be done in order to prevent problems.
IoT also enables similar predictive analytics uses for monitoring oil and gas pipelines, drilling rigs, windmill farms and various other industrial IoT installations. Localized weather forecasts for farmers based partly on data collected from sensor equipped weather data stations installed in farm fields is another IoT-driven predictive modeling application.
Analytics tools and techniques
A wide range of tools and techniques is used in predictive modeling and analytics. IBM, Microsoft, the SAS Institute and many other software vendors offer predictive analytics tools, including machine learning software and related technologies supporting deep learning applications.
In addition, open source software plays a big role in the predictive analytics market. The open source R analytics language is commonly used in predictive analytics applications, as are the Python and Scala programming languages. Several open source predictive analytics and machine learning platforms are also available, including a library of algorithms built into the Spark processing engine.
Analytics teams can use the base open source editions of R and other analytics languages or pay for commercial versions offered by vendors such as Microsoft. The commercial tools can be expensive, but they come with technical support from the vendor, while users of pure open source releases are typically on their own when working through problems with the technology.
Why the predictive modeling process should focus on business value, not analytics glory
A new predictive analytics use case: predictive maintenance on industrial equipment
Find out how PayPal uses predictive analytics to identify and stop payment fraud