News Stay informed about the latest enterprise technology news and product updates.

Data mining strategies and the financial markets

This article is the first in a three-part series discussing issues of project design and project management that, appropriately addressed, improve the practitioner's performance potential.

This article originally appeared on the BeyeNETWORK.

In the annals of the various gold rushes, the vast majority of miners made misguided efforts at claiming the potential riches at stake. Many of these individuals exhausted their savings – and spent a significant portion of their lives – in a fruitless search.

Some miners managed to earn a respectable living, their efforts guided by directed search. They applied this personal knowledge to extract what their claims had to offer.

A few moved beyond the noise of misinformation and identified truly useful knowledge. These few individuals are the stuff of legend.

The financial markets attract the talents of a wide variety of analysts who attempt to utilize any and all techniques to carve out an advantage that will lead to fame and fortune. The market also attracts individuals naïve or foolish enough to expect to stumble across, or be given, the keys to the financial kingdom.

The financial markets have been successfully mined using a variety of techniques. The successful efforts require either hard work or extreme luck. Many who are successful will speak only in general terms about their approach. This is attributed to the value of the nuggets of information they have mined. Information is a diminishing resource. The more it is shared, the less value it has.

Speaking in general terms helps to avoid misunderstandings and unrealistic expectations as well. For example, one of the techniques employed by the author yielded a $40,000 per contract profit in the first half of 1996. Trading one contract required a $2,500 margin account.

At first glance, this result seems exceptional. A more complete examination would reveal that the author’s personal money management rules required $25,000 to $50,000 in the margin account to trade one contract. While this would not necessarily be required of other traders, it is a more accurate picture of the author’s trading environment.

Applying the same technique to the same market in the first half of 1995 yielded a $7,500 profit per contract. While still profitable, this result points out the variance of performance in different time frames. The author would not be surprised by experiencing a loss in some six-month period using exactly the same technique.

A complete evaluation of this technique would require consideration of a number of other factors as well. For example, the number of trades, the maximum drawdown on the trading account, and the interaction with other available techniques would all be major factors in the author’s evaluation. However, traditional metrics like Sharpe value, slippage and commissions incurred, or a traditional return on investment calculation would not be considered in the evaluation. This is not to say that using these evaluations is wrong. They are simply inconsistent with the author’s priorities.

Well defined evaluation metrics, established at the inception of a project, are critical to the success of data mining projects. The following sections of Part 1 of this series describe some of the project definition issues that face today’s data miners if they are going to be truly successful. In Part 2, we will turn our attention to issues and decisions relating to the data itself. In Part 3, we will discuss some of the common pitfalls that can be avoided with good project management and look at the author’s experience in the 30-year U.S. Treasury Bond Futures market.

While the topics do not appear to be highly technical, this article reflects the experience of a long-term data miner. At the same time, the principles discussed can help the newcomer distinguish between valuable information and the fool’s gold of noise.

The author has nearly fifteen years experience applying advanced technology to real problems. Ten years of that experience has been devoted almost exclusively to the financial markets. Much of the work currently described as data mining was simply serious data analysis a few years ago. The techniques continue to advance. The computer technology continues to develop. Our sophistication grows. But, the one overwhelming lesson from this experience is that there is no “silver bullet.”

Every technology that manages to emerge from the academic or research sectors has some strong qualities. Expert systems offer an efficient processing capability and a structure for the development and implementation of rule-based systems. Neural computing has the potential to more accurately capture the information content hidden in the data due to the ability to handle nonlinear data and non-normal distributions. Neural computing has the potential to be more accurate due to its ability to handle inconsistencies in the data and the ability to include inputs that are not independent.

Genetic algorithms offer the user the ability to locate near-optimal solutions to ill-defined problems. Fuzzy set analysis lends the capability of applying quantitative analysis to variables without rigid boundaries. Chaos theory offers insights into systems whose analysis requires taking into consideration a dependence on initial conditions.

Traditional statistical techniques possess many capabilities that should not be overlooked. This well developed body of analysis techniques and research design parameters serves many users well. In the financial markets, technical analysis represents the applied research concepts of many professionals.

None of these technologies is a panacea. Each technology is a valuable tool for the data mining practitioner. The limitations of the techniques must be considered as well. The limitations should not disqualify them from acceptance. Rather, they represent a realistic appraisal of the capabilities available. The strength of advanced technologies comes from adding additional capabilities to the practitioner’s toolbox. Knowing when and how to use the tools in the toolbox is what separates the novice from the expert.

This article outlines a data mining methodology and addresses specific concerns for the financial markets. The issues discussed represent the experience of one individual. There are other approaches as well. As discussed next, a good portion of your success is determined by the questions you ask and how well you sift through the answers.

Information Libraries

Data mining practitioners in the financial markets should be concentrating on developing a collection of predictable events. By identifying a set of predictable events, each with a known level of reliability, the financial miner of data is put in the position of waiting for the occurrence of predictable events. These miners become information-based traders. If the identified situations have been carefully constructed and validated, they can of offer dramatic rewards.

It is important that the events in the library of tradable situations be significant enough to justify the attention given to them. It is unlikely that a single event with an expected frequency of occurrence of once every five years will justify watching for its expected one point of profit.

Conversely, the larger the library of events, the better our chances of profitably trading their occurrence. The quantity of situations we are willing to act on increases our opportunities to trade. The diversity of opportunities to trade reduces our risk.

Many data mining practitioners promote the idea that it is not necessary to possess significant domain knowledge to identify new information. Advanced analysis techniques will do that for you. In a sense they are right. The techniques possess no particular domain expertise. What is important is the skill of the practitioner in applying the technique. Applying the “so what” test is much easier for the individual who can appreciate the question being asked.

Problem Definition – Know What You Want to Do

The first priority of any data mining effort involves defining success. While this sounds obvious, many people do not take the time to define their problem and determine how to measure the results. Those who do make the attempt often use overly simplistic and incomplete definitions. Simply maximizing profit is not enough. It does not address the issue of how much money I may lose before I make the profit promised at the end. Few traders have the stomach, or the wallet, to ignore the impact of drawdowns. The real-world evaluation criteria for a successful trading system are complex, with many subtleties.

Many people approach data mining with an emphasis on the identification of “new” relationships in the data. What is new is only a partial, and sometimes an unimportant, aspect of successful data mining. By far, the more important question is “so what?” – that is, how can this new insight be translated into profit?

If a relationship cannot be applied to your personal goals and work habits, it has no value to you. Vast arrays of techniques exist for trading the financial markets. Because of knowledge, resources, or personal circumstance, many of these techniques can be implemented only by a relatively small number of people and are of no value to the rest of us. That does not diminish the inherent value of these techniques. Who would question the value of a cure for cancer or AIDS, even if neither is a personal condition?

In the process of data mining, “so what” is a personal evaluation. It is common to see analysis techniques hyped as the universal solution. It is just as common to see techniques bashed as having no value. In most cases, both opinions are correct. The value or lack of value of any technique can only be determined on an individual basis.

Optimization techniques, like genetic algorithms, require the development of an objective function for the purpose of evaluating alternative scenarios. Using an objective function requires the developer to identify all factors having an influence on the evaluation of the model and specific listing of all constraints.

While many practitioners find the development of this objective function tedious, I would recommend it as the first step for every project. An appropriate objective function allows the practitioner to make a clear choice. Most people never make a nickel on R2. They do profit by being more effective in meeting real criteria and constraints.

Most people trade financial instruments to make money. Too many of these same people let other issues get in the way of meeting this objective. If a trader is only willing to consider certain techniques, he severely limits his potential. If he needs to trade all the time, rather than trading what he knows, he is a gambler, not a trader. If he has not taken the time to learn his trade, thinking a key “tip” will make him rich, he is a fool.

As an example, suppose you had perfect knowledge of the closing price of the S&P 500 futures contract, five trading days into the future. Further, you know with 100% reliability that it would be identical to today’s closing price.

Observing that the closing price five days from now is the same as the current price, most individuals assume there is nothing to be done. Nothing could be further from the truth. Knowledge of the future always has value in the financial markets. One thing we know is that the price will not remain unchanged for the entire five days. Given perfect knowledge of a future price, there are several ways we can profit from it.

Some successful traders would wait for prices to change by some fixed amount and then take a position in the market that would be closed for a profit every time the S&P hit the known target price. Other, more aggressive traders would employ the same technique, but would add to their positions as the price of the S&P contract moved further from the target price.

Another group of traders would be trading options contracts on this certain knowledge of future events. The owner of an option has the right, but not the obligation, to either purchase (a call option) or sell (a put option) the financial instrument the option is based on. Options contracts are valid for a limited period of time. Therefore, a component of the cost is based on time. All other things being equal, the option is worth less in the future than it is worth today. Given perfect information, we should sell either, or both, types of options immediately, and in as large a quantity as possible. Five days from now, when the price of the S&P comes back to the current level, the options can be repurchased for less than the current price, giving us a profit.

For some traders, knowing what not to do can be worth a great deal of money. If the price of 30-Year US Treasury Bonds was known, instead of the S&P, many bankers could save a great deal of money by not incurring the costs involved with hedging their loan portfolios. This hedging process can best be thought of as a type of insurance against a large move in prices. If we know with certainty that there is no risk of loss, there is no need to insure against one.

Giving a banker perfect knowledge of the S&P does not meet the “so what” test for a banker. The information is new. However, for a banker, the only concern is making sure that a change in interest rates does not cause a loss on his mortgage portfolio. Knowing the exact closing price of an S&P futures contract has no value.

Even with perfect knowledge, many people would not make money. Faced with the aforementioned situation, the most common reaction would be “I do not trade futures. They are too risky.” Others would note that while they trade futures, they do not trade the S&P. These people are displaying an attitude that is analogous to refusing to bend over and pick up the bag of money lying at their feet.

No approach is “right” or “wrong.” Rather, the individual with the information determines the value of the information learned. Establishing an objective function that takes into account all objectives and constraints is suggested regardless of the techniques employed. Doing so allows for a quantitative metric for comparison of alternative models.

A perfectly profitable approach for one trader will likely be ignored, or misused, by another trader. Data mining is profitable only if the information is evaluated on the basis of an individual’s performance objectives that take all implementation issues into consideration.

Define Your Output – Know What You are Trying to Predict

Many people attempt to predict price. Others attempt to predict direction. What you attempt to predict must correspond to the metrics used for success. The degrees of difficulty for different types of predictions must also be considered.

In trying to predict the direction of the market, three potential outcomes exist in one dimension. The market can move down, remain flat, or move up. While by no means easy, this is the least challenging level of difficulty.

Fuzzy models can be effectively used for this type of representation. The author has found that modeling these categories as one continuous variable with a range of –1 to 1 has been the most reliable approach. By setting thresholds appropriately, boundary locations can be optimized for the particular data set used.

Attempting to predict a price target adds a second dimension. Both direction and distance capabilities must be mastered simultaneously. These may be independent decisions generated from separate models. The author has used traditional statistical analysis, technical indicators, chaos analysis, and rule-based representations to develop a model that predicts price targets. These combined techniques offer a synergy effect that is unavailable from using a single technique.

One reason we need to develop an information library of potential situations stems from the tendency of the markets to have different characteristics depending on the direction. Once the expected direction has been determined, then the estimate of distance is developed.

Many people attempt to predict the closing price of an instrument at some time in the future. With all of the resources of the space program, landing a man on the moon requires in-flight adjustments to account for the inaccuracies in the prediction of the movements of physical objects. This problem is at the same level of difficulty as predicting closing prices. Both require accuracy on the same three dimensions, direction, distance and time. Predicting closing price has the added complexity of modeling a behavioral system, as opposed to a physical system.


Successful mining of data in the financial market requires extensive effort and a diverse set of skills. The potential rewards are significant. This article examined a number of key issues related to defining the project. Trading information, rather than noise, involves developing a well-defined set of skills that meet the needs of the individual trader. A clear understanding of the objective is critical.

As the number of techniques in your library grows, so do the number of opportunities you have to trade. Traders do not have to trade all the time to be successful. They should be driven by the need to reduce trading activity to high quality opportunities, not by the urgency to take any position in an attempt to justify the time spent watching the market. There is no need to become one of the casualties of the markets.

In Part Two, we will discuss a number of issues related to the data used in performing analysis. Enhancing the information content of your data adds value. Applying diverse techniques, new and old, enhances the data miners’ library.

Dig Deeper on Business intelligence data mining

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.