Citizen data science is on the rise as modern analytics tools empower an increasingly diverse community of data analysts to implement predictive models. Yet a byproduct of the democratization of predictive analytics is confusion about which algorithms to use for what, and whether a particular type of predictive model will perform better than others.
While the available tools provide a rich palette of analytics methods to work with, the plethora of choices can stun inexperienced users into a near-catatonic state as they ponder where to begin with predictive analytics applications. As a result, it's helpful to devise a set of processes and best practices that can guide analysts to match the right methods, algorithms and models to the business challenges with which they're presented.
Here are some guidelines on what that might look like at a high level:
- Specify the business opportunity and objectives, describe the desired analytics outcomes, and set measurable goals for success. This will also help to define performance metrics that can be used later to evaluate how well different analytics methods work.
- Identify the methods that can be used to do the required analytics work. In some cases, it may make sense to consider and, ultimately, employ multiple methods.
- Review and choose the types of algorithms that can be used to implement the selected analytics method(s).
- Design and build predictive models based on the chosen algorithms. Make sure that it's possible to compare them with the previously defined performance metrics.
- Apply the different models, assess their performance and select the ones that can best lead to the intended outcomes.
Better business through analytics
This approach to predictive analytics applications can be illustrated by an example. Let's consider an e-commerce company that wants to boost its profits by growing sales to existing customers. The objectives might be to increase both the number of items bought by individual customers and the average amount they spend overall in purchase transactions.
A typical strategy to accomplish those goals involves using a recommendation engine to try to influence customers to add items to their online cart as they shop. There are a variety of different analytics methods that the online retailer can incorporate into its recommendation engine to assign similar customers to groups so the engine can suggest products that they might be inclined to buy.
For example, classification maps the characteristics and behaviors of customers to predefined categories. There are various classification algorithms, including ones for nearest-neighbor, decision-tree, rule-based and Bayesian classification; many analytics tools let users build different classification models whose outputs can be compared while developing predictive analytics applications.
Alternatively, clustering uses machine learning algorithms to assign customers to separate groups by calculating similarity scores based on a set of parameters, such as age, income bracket and educational background. In the case of a customer database, k-means clustering is a commonly used method; it works by partitioning, in which an algorithm begins by selecting virtual center points for a number of groups and assigning customers to the group with the closest center point.
The algorithm then checks to see if customers assigned to different groups are similar enough to be grouped together. Once people are reassigned as needed, the virtual center points of the groups are recomputed; the process is repeated until there are no longer any changes that need to be made, after which the recommendation engine can be put into action.
Data affinity by association
Association rule mining is another analytics method that looks for relationships among different data attributes. For doing predictive customer analytics, it produces rules that support market basket analysis aimed at finding products that frequently coexist in online shopping carts in order to identify purchase patterns that can trigger recommendations to shoppers.
Let's say an association rule indicates that 70% of the time when customers purchase gourmet coffee and chocolate on the e-commerce company's website, they also buy designer paper dessert plates. This provides a potential match that the recommendation engine can act on if a customer adds two of those items to a cart.
To test the effectiveness of these different methods, the company builds predictive models based on them and integrates the models into its operational environment for trial runs. Hopefully, each will spur additional product purchases that can be measured over a short time period. The model that generates the largest increase in revenue can be retained for general deployment as part of the recommendation engine, as well as continued development and refinement -- or multiple models can be.
Using the approach outlined here, data analysts in an organization will gain greater awareness of the different types of machine learning algorithms, and can also learn which ones work best for particular predictive analytics applications. Ideally, the end result will be increased agility in assessing analytics requirements and matching the right methods to emerging business needs and opportunities.