This article originally appeared on the BeyeNETWORK.
Data Selection – What Has Information Content
Many people believe they can process almost any data in their mining effort. After all, they reason, if there is a correlation between the movement of the financial instrument and the phase of the moon, why not use that fact?
Unfortunately, many people have lost large sums of money with just that logic. Using techniques that require no knowledge of the problem under analysis is a powerful concept. Many of the advanced technologies available today provide just such a capability. What most people do not understand is that the user needs to have some understanding of the problem to develop and evaluate the models successfully.
The better the information content of the data used for the data mining effort, the better the potential for extracting useful, new information. This is the single most productive area to work in, regardless of the technology employed for your data mining effort. Improving the information content is analogous to mining in a field with a high gold content, as opposed to one with a relatively low percentage of gold. If the miner is talented enough, either can be profitable. However, your odds are better if there is more gold to be found.
You can improve the information content of your data set in three ways. The first, and most common approach is to add new variables to the data set. Assuming that a new variable captures some aspects of the problem not already included elsewhere, the information content has been increased.
The second approach is what is commonly called data preprocessing. Data preprocessing involves the transformation of the existing data into another form. Typically, this is nothing more than a mathematical manipulation of the data. Traditional technical indicators often do nothing more than preprocess the price data. A moving average, for example, is simply a mathematical manipulation of the underlying price variable.
Why is preprocessing important? Simply put, the transformation of the data by preprocessing may allow us to extract the information content more effectively or more efficiently. There is no reason to believe that any data mining technique would conceive of and implement a simple moving average unless it was specifically told to do so. The information content available in the moving average representation of price is also available directly from price. It is simply harder to find, and harder to extract.
Think of transforming data as shifting the angle at which you are viewing the information content. If you are trying to estimate the height of a building, and only view the building from directly overhead, you are faced with dubious chances of success. However, given one or more other angles from which to view the building, your chances of accurately estimating its height are improved. This is not to say that you are guaranteed a better estimate. Success ultimately depends on skill in applying the information available.
Taken to the extreme, assume you have developed 10,000 different views of the building. After some point, adding a new view contributes little, if any, new information to the analysis. In fact, additional perspectives may actually hinder your capability to make accurate estimates.
The third approach to improving the information content of data involves removing data that does not increase the information content. As with human information processing, we frequently overload our analysis with useless or redundant information. In an attempt to consider all aspects of this “noise” data, we actually reduce the effectiveness of our decision-making capability. By removing this noise from consideration, we may be able to dramatically improve the results of our decision making.
Neural computing offers a unique capability in evaluating the importance of data inputs. The practitioner is well served by understanding that there is no artificial intelligence in the development of a neural computing model. A neural model is a mathematical formula developed by iteratively testing and adjusting the weights in the formula against known results.
Neural models exhibit sensitivity to initial conditions. Initial weights in the model are randomly generated. A different set of starting weights, or a different structure to the model, results in the development of different resulting formulas.
As with any mathematical formula, by comparing the absolute value of the weights associated with the variables comprising the model, the practitioner can determine the relative importance of the variables in the model. By analyzing the variable weighting of a number of models, the practitioner can develop an appreciation for variables that consistently carry significant weight across a number of test models.
Identifying the variables consistently having significant weight and retaining them in future models can improve performance. Likewise, identifying and discarding variables having consistently low absolute weights can eliminate noise from a model. Neural computing with its sensitivity to initial conditions is particularly useful for this type of analysis. Traditional techniques utilizing a deterministic method for developing weights in a single iteration are typically unsuitable for this type of analysis. This is due to the fact that they consistently develop the same formula.
Physical Systems versus Behavioral Systems
Many of the practitioners of data mining have significant formal training in data analysis. Many of these same individuals make no distinction between the types of systems they are modeling.
Physical systems have the advantage of being relatively stable. They also tend to consist of causal, if complex relationships. In our building example, the building did not change over the time we were developing our different views.
Behavioral systems, on the other hand, are often fuzzy, inconsistent, incomplete, and subject to change. Using the same assumptions and techniques to create models in both environments is an invitation to disaster. It is common for two traders in the S&P pit to have different expectations of the market or the impact of an event on the market.
Mining the financial markets involves modeling a behavioral system. The pattern of price movements is based on a complex system of buyer behaviors. Each participant possesses unique motivations and rationale. The composite behaviors are inconsistent between market participants because the individual participants display inconsistent behavior patterns over time. In short, prediction in this environment can never hope to display a high degree of accuracy. The good news is that being profitable does not require extreme levels of accuracy.
Traditional statistical techniques and technical indicators are often unfairly criticized for their inconsistent performance. What is often overlooked is the fact that the problem is more complex than the question being posed.
A stochastic indicator, from technical analysis, can take on a value between 0 and 100. One approach to trading based solely on a stochastic indicator is to buy an instrument when the value is less than 20 and to sell when the value is greater than 80.
A stochastic indicator is often criticized when prices do not reverse as expected. The stochastic becomes a victim of the trader’s lack of understanding. Chaos theory offers a capability for the analysis of strange attractors. Prices tend to oscillate within a strange attractor. Occasionally, they will break out of the attractors and trend in a direction for some time before developing a new attractor.
If the practitioner, or trader, understands that the stochastic indicator works reliably within the attractor, but has little value on the breakouts, two things happen. The practitioner recovers the use of a valuable tool and an understanding of how to use it. And, the practitioner understands the importance of the relationship of price activity and strange attractors.
Correlation versus Causality
It is intriguing to observe the willingness of people to accept the limitations of correlation in data relationships. Assuming that people have defined their problem appropriately, they are perfectly willing to dump in any type of information and accept any strong correlation in the data identified by the analysis techniques employed.
Not so long ago, you could call a 900 number to get the current depth of a river in Southeast Asia that was correlated to the S&P. Some people still monitor the phase of the moon.
Many traders argue that price gaps have a degree of significance, and call for a certain type of action. For those who have at least studied the occurrence of price gaps for the S&P, they generally note that they tend not to close. That is as far as most people go with their analysis.
A lazy data mining practitioner, not looking beyond what the computer provides, may settle for correlation. On deeper analysis of the way the market functions, it becomes apparent that many gaps occur in conjunction with the release of government reports. Since many of the reports are released an hour before the S&P market opens, gaps are common and tend not to close. Price gaps tend to close for the 30-year U.S. Treasury Bond, since the market is open when reports are released.
Regardless of the technique used, from simple observation to the most advanced technologies available, the assumption of causality cannot be made on the basis of correlated observations. In the effort to extend our knowledge through data mining, we cannot allow ourselves to take leaps of intuition.
An area that is often mishandled by practitioners of data mining involves classifying data as an outlier. This encompasses two different issues. The first is data containing errors. Data errors are easily handled. Correct the errors. Caution needs to be taken in doing so. If decisions were made using the erroneous data and you are using both the data and the resulting decisions in your data mining effort, you must accurately maintain that action/reaction pair.
New information becoming available is not the same as an error in the original data. Many people conducting an analysis of financial instruments update their data as new information becomes available. This can lead to disastrous results when models are developed based on the analysis of historical data. It implies knowledge of future events when making a decision. The accuracy of any model developed on this assumption will be diminished.
A true outlier is an event that falls well outside of the normally accepted expectations. In the mining of the financial markets, it is not uncommon for analysts to remove data from consideration because of the magnitude, and the relatively rare occurrence, of the event involved. This is absolutely incorrect. Few people are interested in predicting average behavior. In the markets, the dramatic events contain the most risk and the most profit potential. These events must be accounted for appropriately.
In many instances, such as developing a neural model, these outliers are rare glimpses of information otherwise unattainable. They represent the near unique occurrences that we would like to be able to anticipate and predict. Excluding them from the data used to develop a model precludes learning the activity that caused the radical outcome.
Many data mining practitioners have had disappointing results with traditional statistical techniques for this reason. Most have received statistical training that developed a model of average behavior. In modeling the financial markets, a model of average behavior is inconsistent with our objectives in most cases. Unfortunately, many data miners fail to recognize this fact and conclude that the technique has failed them. The reality is they have developed perfectly valid models for the wrong questions.
Another common mistake in the development of neural and other models is the tendency to maintain the naturally occurring distribution of data in the training set. This may make it nearly impossible to develop an accurate model.
I recently spoke with a modeler who felt that he had identified a particularly reliable scenario. He used 15-minute data intervals for his analysis. By doing so, he felt that the odds of his scenario occurring were about one in 25,000.
Data mining practitioners must understand that this becomes a categorization problem. In evaluating situations as they naturally occur, a neural computing model will learn that it will be right to a remarkable level of statistical significance by always saying the situation has not occurred. In fact, it will be wrong in only one case in 25,000. Most traders would love to boast such accuracy in their trading. Unfortunately, the one case where the model is wrong is the one case that really interests us.
By giving the neural computing development system a uniformly distributed set of output categories to learn from, the system is forced to differentiate between categories. In our example, that would mean an equal number of cases where we wanted to trade compared to the number of inconsequential cases. This does not guarantee accurate learning. It does give the model a chance to identify the significant events instead of assuming the insignificant event occurs in all cases.
Until recently, many people argued that the movements of prices in the financial markets are random. Many researchers now confirm what practitioners have known for years. Price movements are not random. Few would argue against the existence of trends and trading ranges in the financial markets. The occurrence of these characteristics is sufficient to demonstrate that the markets are not a series of random price movements.
The question then reverts to whether the market is predictable and to what we can use to predict market activity. The logical answer is that predictability depends upon the talent of the predictor. The financial markets have the potential to be predictable. The successful predictor of market changes can reap great rewards. That is why so many people are pursuing data mining efforts.
Based on the characteristics of behavioral systems and the constraints of data mining technology, it is fair to say that the markets are partially predictable. In many cases, there is insufficient information to make a reliable prediction. Where predictive capability exists, there is no certainty. Only the probabilistic reaction to a set of conditions is attainable.
If we accept the tenets of chaos theory that are associated with a dependence upon initial conditions, we must accept that after some finite number of steps into the future, the divergence of complex systems with even small initial differences makes it impossible to predict accurately. The ability to predict, with a high level of precision, what the price of an instrument is going to be becomes more difficult the further out in time we move.
That does not imply that the Efficient Market Hypothesis and Portfolio Theory are correct. It also does not support the Random Walk Theory of market activity. With increasing regularity, diverse markets are shown to exhibit behaviors that fail to support these self-serving proposals. The price activity of almost all instruments is not random and is not normally distributed. The simple existence of trends in prices is sufficient to disprove a normality or randomness assumption. The impact of this reality is beyond the scope of this article, but needs to be understood by the successful trader.
This realistic view of our knowledge is mandatory if we are to trade successfully. Understanding what we can rely on, and what is unreliable, is critical. Having the discipline to act on the reliable, rather than on emotion, is essential to financial survival.
Part three of this series will examine some common problems in financial data mining efforts and review the author’s experience with data mining in 30-year Treasury Bond Futures.