Manage Learn to apply best practices and optimize your operations.

# How to evaluate the viability of a data mining project

## Data mining

### Looking for something else?

TECHNOLOGIES
Data analysis Data mining Data quality management Predictive analytics

This is an excerpt from Chapter 2, "Business Objectives," from the book Commercial Data Mining: Processing, Analysis and Modeling for Predictive Analytics Projects, by David Nettleton. Nettleton is a consultant and academic researcher with more than 25 years of IT experience, primarily in databases and data analysis. In this chapter, he explains how to set up a data analysis project for success by identifying specific business objectives in the planning stage. He goes on to discuss other considerations related to data availability, project scope and data quality. Using simple mathematical formulas, Nettleton also describes how to estimate the effectiveness of a data mining project prior to execution.

A commercial data analysis project that lives up to its expectations will probably do so because sufficient time was dedicated at the outset to defining the project's business objectives. What is meant by business objectives? The following are some examples:

• Reduce the loss of existing customers by three percent.
• Augment the contract signings of new customers by two percent.
• Augment the sales from cross-selling products to existing customers by five percent.
• Predict the television audience share with a probability of 70%.
• Predict, with a precision of 75%, which clients are most likely to contract a new product.
• Identify new categories of clients and products.
• Create a new customer segmentation model.

The first three examples define a specific percentage of precision and improvement as part of the objective.

In the fourth and fifth examples, an absolute value is specified for the desired precision for the data model. In the final two examples the desired improvement is not quantified; instead, the objective is expressed in qualitative terms.

### Criteria for choosing a viable project

This section enumerates some main issues and poses some key questions relevant to evaluating the viability of a potential data mining project. The checklists of general and specific considerations provided here are the bases for the rest of the chapter, which enters into a more detailed specification of benefit and cost criteria and applies these definitions to two case studies.

### Evaluation of potential commercial data analysis projects -- General considerations

This excerpt is from the book Commercial Data Mining: Processing, Analysis and Modeling for Predictive Analytics Projects, by David Nettleton. Published by Morgan Kaufmann Publishers, Burlington, Massachusetts. ISBN 9780124166585. Copyright 2014, Elsevier BV. To download the full book for 25% off the list price of this and other books until the end of 2014, visit the Elsevier store and use the discount code PBTY14.

The following is a list of questions to ask when considering a data analysis project:

• Is data available that is consistent and correlated with the business objectives?
• What is the capacity for improvement with respect to the current methods? (The greater the capacity for improvement, the greater the economic benefit.)
• Is there an operational business need for the project results?
• Can the problem be solved by other techniques or methods? (If the answer is no, the profitability return on the project will be greater.)
• Does the project have a well-defined scope? (If this is the first instance of a project of this type, reducing the scale of the project is recommended.)

### Evaluation of viability in terms of available data -- Specific considerations

The following list provides specific considerations for evaluating the viability of a data mining project in terms of the available data:

• If part or all of the data does not exist, can processes be defined to capture or obtain it?
• What is the coverage of the data with respect to the business objectives?
• What is the availability of a sufficient volume of data over a required period of time, for all clients, product types, sales channels and so on? (The data should cover all the business factors to be analyzed and modeled. The historical data should cover the current business cycle.)
• Is it necessary to evaluate the quality of the available data in terms of reliability? (The reliability depends on the percentage of erroneous data and incomplete or missing data. The ranges of values must be sufficiently wide to cover all cases of interest.)
• Are people available who are familiar with the relevant data and the operational processes that generate the data?

### Factors that influence project benefits

Assigning a Value for Percent Improvement
The percentage improvement should always be considered with regard to the current precision of an existing index as a baseline. Also, the new precision objective should not get lost in the error bars of the current precision. That is, if the current precision has an error margin of +/-3% in its measurement or calculation, this should be taken into account.

There are several factors that influence the benefits of a project. A qualitative assessment of current functionality is first required: what is the current grade of satisfaction of how the task is being done? A value between 1 and 0 is assigned, where 1 is the highest grade of satisfaction and 0 is the lowest, where the lower the current grade of satisfaction, the greater the improvement and, consequently, the benefit will be.

The potential quality of the result (the evaluation of future functionality) can be estimated by three aspects of the data: coverage, reliability and correlation:

• The coverage or completeness of the data, assigned a value between 0 and 1, where 1 indicates total coverage.
• The quality or reliability of the data, assigned a value between 0 and 1, where 1 indicates the highest quality. (Both the coverage and the reliability are normally measured variable by variable, giving a total for the whole data set. Good coverage and reliability for the data help to make the analysis a success, thus giving a greater benefit.)
• The correlation between the data and its grade of dependence with the business objective can be statistically measured. A correlation is typically measured as a value from –1 (total negative correlation) through 0 (no correlation) to 1 (total positive correlation). For example, if the business objective is that clients buy more products, the correlation would be calculated for each customer variable (age, time as a customer, zip code of postal address, etc.) with the customer's sales volume.

Once individual values for coverage, reliability and correlation are acquired, an estimation of the future functionality can be obtained using the formula:

Future functionality = (correlation + reliability + coverage)/3

An estimation of the possible improvement is then determined by calculating the difference between the current and the future functionality, thus:

Estimated improvement = Future functionality - Current functionality

A fourth aspect, volatility, concerns the amount of time the results of the analysis or data modeling will remain valid.

Volatility of the environment of the business objective can be defined as a value of between 0 and 1, where 0=minimum volatility and 1=maximum volatility. A high volatility can cause models and conclusions to become quickly out of date with respect to the data; even the business objective can lose relevance. Volatility depends on whether the results are applicable over the long, medium or short terms with respect to the business cycle.

Note that this a priori evaluation gives an idea for the viability of a data mining project. However, it is clear that the quality and precision of the end result will also depend on how well the project is executed: analysis, modeling, implementation, deployment and so on. The next section, which deals with the estimation of the cost of the project, includes a factor (expertise) that evaluates the availability of the people and skills necessary to guarantee the a posteriori success of the project.

Read an interview with David Nettleton here.

This was last published in May 2014

## Content

Find more PRO+ content and other member only offers, here.

#### Start the conversation

Send me notifications when other members comment.

## SearchDataManagement

• ### Apache Hadoop 3.0 goes GA, adds hooks for cloud and GPUs

Is this the post-Hadoop era? Not in the eyes of Hadoop 3.0 backers, who see the latest update to the big data framework ...

• ### Expert: For BI, you must know the data integration process

Understanding the data integration process is central to self-service BI and data architecture design, consultant Rick Sherman ...

• ### Graph technology rivals take Amazon Neptune database in stride

Amazon's Neptune database may change the status quo in the graph technology world. But it could also introduce a wider base of ...

## SearchAWS

• ### Three Amazon AI-based projects to get your dev team rolling

Amazon AI technologies expand capabilities for enterprise developers. Try out these sample AI projects to familiarize yourself ...

• ### AWS-VMware partnership remains a win-win -- for now

When AWS and VMware partnered up, it opened new revenue streams for each company. But does the deal mark a one-off collaboration ...

• ### Five AWS tips and tricks to advance cloud app development

Developers in 2017 continued to embrace Lambda and containers, as AWS added more support and services. Review SearchAWS' top ...

## SearchContentManagement

• ### Intelligent information management the next wave for ECM

In a 2018 upgrade, M-Files allows users to search for content in multiple repositories, while also being able to automatically ...

• ### SharePoint integration and implementation best practices

Here are some expert advice and tips, as well common definitions, to help make your SharePoint integration and implementation a ...

• ### SharePoint branding capabilities get a facelift

Since Microsoft Ignite last September, SharePoint Online is getting new branding capabilities that have been on the wish lists of...

## SearchCRM

• ### Ten tips for implementing virtual agent chatbots

Think you're ready to turn over some of your customer service responsibilities to virtual agents? An expert offers 10 tips for ...

• ### Survey marketing leads to sales insights for outdoor association

After decades of catering to the needs of known customers, outdoor retailers now see the light and are targeting old and new ...

• ### What key features do today's leading social CRM tools offer?

Before purchasing social CRM software, learn the important tools and services you should be looking for and which of today's ...

## SearchOracle

• ### Using Oracle 12c Unified Auditing to set database audit policies

Oracle Database 12c's built-in Unified Auditing feature streamlines the database auditing process, including creation and ...

• ### Top Oracle tips and tricks of 2017 you won't want to forget

We've rounded up five of the most notable tip articles we published in 2017, with advice that can help make Oracle projects ...

• ### Big Data Cloud Service streamlines Oracle Hadoop deployments

As part of its Big Data Cloud Service, Oracle provides a set of internal and external tools designed to help users efficiently ...

## SearchSAP

• ### SAP S/4HANA Cloud and indirect access will dominate 2018

Industry experts say SAP S/4HANA Cloud migrations, Leonardo and Cloud Platform are the technology issues for SAP in 2018; on the ...

When a Dutch energy grid provider needed to develop new business apps on top of SAP ERP, it turned to the Mendix RAD platform to ...

• ### SAP's Timo Elliott on enterprise chatbot AI technology

The SAP global innovation evangelist expects AI to affect businesses in three ways: human-computer interaction, automation of ...

## SearchSQLServer

• ### Meltdown and Spectre fixes eyed for SQL Server performance issues

Microsoft has responded to the Spectre and Meltdown chip vulnerabilities with patches and other fixes. But IT teams need to sort ...

• ### Five SQL Server maintenance steps you should take -- ASAP

Putting off SQL Server administration tasks can lead to database problems. Enact these often-neglected maintenance items to help ...

• ### Microsoft Cosmos DB takes Azure databases to a higher level

Azure Cosmos DB brings a new element to the database lineup of Microsoft's cloud platform, offering multiple data models and a ...

## SearchSalesforce

• ### Salesforce databases remain Oracle, for now

Oracle and Salesforce executives deny reports of Salesforce moving on from Oracle infrastructure.

• ### Comparing Salesforce Einstein AI across the clouds

What do you get when you pay for Einstein features in the Salesforce Sales, Service and Marketing clouds? We put it all in one ...

• ### Salesforce Marketing Cloud certification helps Aussie make bank

When considering which credentials in the Salesforce universe are the best investments, one marketing consultant makes the case ...

Close