Introduction

 

Overview

The Evolutionary Grid Machine Lambda (EGM) is a learning machine which learns to select and score the best individuals from a universe of individuals over time. Over a series of discrete time steps, a universe of individuals is collected for each time step. The individuals are things such as Stocks, People, Cities, etc. The discrete time steps are weeks, days, seconds, years, microseconds, etc.

Each individual followed by the system is given a unique identifier which remains unique across all time periods studied (no two individuals ever have the same identifier). Furthermore, each time period studied is given a unique ascending integer index (i.e. week 1, week 2, etc.). So, for a series of time periods, historical information about groups of individuals is collected for each time period. The historical information collected for each individual for each time period is stored in a Number Vector and includes: the time period index; the unique identifier of the individual; and other numeric information about the individual pertinent to the current investigation. Finally, each individual in each time period is given a numeric "score" which determines the value of the individual in that time period. The "best" individuals have the highest "score" values.

During training, the EGM is given historical information for time periods 0 through T for all individuals. The EGM is also given the "score" values for each individual in each training time period from 0 through T. During training the EGM attempts to "learn" any patterns in the available historical data. The machine (EGM) is free to discover static as well as time varying patterns.

During forward prediction, the EGM is given new information for time period T+1 for all individuals. The EGM is NOT given the "score" values for each individual in the new time period T+1. During prediction the EGM attempts to use any patterns it has learned to select and score the best individuals from the universe of individuals seen in time period T+1. The machine (EGM) is free to select no individuals in time period T+1 (an "I am uncertain" response). Once the machine selects and scores a set of individuals, the accuracy of the machine is determined by least squares error on the selected individuals, averaging the actual "score" values for the selected individuals (which the machine has never seen) with the average "score" for all individuals in time period T+1, or by other appropriate methods.

Summary

A time series set of vectors, X, together with a set, Y, of scores for the vectors in X are used to train a learning machine. There is also a testing set, TX and TY, of of vectors similar to those in X and Y but for the testing time period (a time period not present in X or Y). After training, the machine is presented with the testing set, TX and attempts to estimate TY. The learning machine returns a Vector EY, of estimates for TY. Each element in EY is an estimate for the corresponding value in TY which is either void (an "I am uncertain" response) or contains a numeric estimate for the corresponding value in TY. All non-void elements in TY are said to be "selected". The learning machine attempts to select the "best" individuals (with above average "scores" in the testing time period).

The selection mission of the Evolutionary Grid Machine is an important distinguishing feature between these learning machines and regressions learning machines. The EGM is NOT trying to fit a function to the testing data. Instead the EGM is trying to select out the "best" individuals from the testing data, scoring only within the selected "best" individuals.

In many instances the Evolutionary Grid Machine may not have an exact estimate for the scores of the best individuals in the testing data. However, if the learning machine is able to select the best individuals out of the testing data, then the machine has been successful even if its estimated score is incorrect. Furthermore, if the machine's estimated score is incorrect; but, preserves the ordering of the corresponding actual scores, then the machine has achieved an added layer of success.

Description

Let X be a set of vectors such as the vector x = {e[0] e[1] e[2] ... e[M]}, where each e[m] is an independent element and let Y be a vector of dependent "score" values. Let both X and Y be of length N.

Furthermore, let the zeroth element, e[0], of every vector, in X, contain a non-negative integer value indicating some time span of integral length, for example, if the time span were weeks, a value of 1 would indicate week one, and a value of 10 would indicate week ten, etc. (i.e. the vectors contained in X are time sequenced).

Furthermore, let the first element, e[1], of every vector, in X, contain an integer entity identifier value indicating some unique entity. Thus two vectors x[i] and x[j] having the same entity identifier (x[i].e[1] = x[j].e[1]), are said to refer to the same abstract entity in some unspecified manner.

Furthermore, for any two distinct vectors, in X, the following condition ((x[i].e[0] == x[j].e[0]) && (x[i].e[1] == x[j].e[1])) is an error may never be true (i.e. two vectors cannot exist for the same entity and the same time span).

Finally, any unique entity value may be missing entirely from all vectors in X, or present for some integral time values but missing for other integral time values.

Example

An example, but by no means the only example, of X and Y would be a set of vectors of stock market data taken from the Open High Low Close and Volume numbers for all NASDQ traded stocks over a 52 week period. In this example we have the following:

e[0] The sequential index of the current week (from 1 through 52).
e[1] The unique integer identifier of the stock (stocks not traded would not appear).
e[2] The current week opening price.
e[3] The current week high price.
e[4] The current week low price.
e[5] The current week closing price.
e[6] The current week share volume traded.
Y The "score" vector of next week profits (next_week_closing_price - the_current_week_closing_price).

Similar examples can be constructed for oil exploration data over time, for the height and weight of individuals over time, etc. However, continuing with our securities example, we train our machine on the market data for four stocks over a four week period as follows:

Training Data

TimeIDOpenHighLowCloseVolScore
x.e[0]x.e[1]x.e[2]x.e[3]x.e[4]x.e[5]x.e[6]y
(first week)Note: (next week's profit)
01 (Apple)$23.45$25.67$23.35$24.56193673.4%
02 (IBM)$143.45$145.27$143.15$144.96894676-1.2%
03 (Xerox)$13.95$15.27$13.35$14.72568324.8%
04 (GM)$57.15$62.17$53.65$62.0534196479.1%
(second week)Note: (next week's profit)
11 (Apple)$24.56$25.38$22.75$25.40120461.2%
12 (IBM)$144.96$144.96$143.15$143.23864023-3.2%
13 (Xerox)$14.72$16.12$14.39$15.43592043.4%
14 (GM)$62.05$62.05$68.00$67.7032193826.5%
(third week)Note: (next week's profit)
21 (Apple)$25.40$26.98$24.75$25.71220560.8%
22 (IBM)$143.23$143.23$136.75$138.64824093-4.3%
23 (Xerox)$15.43$16.45$15.09$15.9661205-1.4%
24 (GM)$67.70$75.35$66.39$72.1036195827.8%

We train the EGM on the training data shown above. After training, we show the machine the following testing data, TX, and ask it to return an estimate of the next week's profit, TY, for each of the four individuals. We do not show the machine the scores, TY.

Testing Data

TimeIDOpenHighLowCloseVolScore
tx.e[0]tx.e[1]tx.e[2]tx.e[3]tx.e[4]tx.e[5]tx.e[6]ty
(fourth week)Note: (next week's profit)
31 (Apple)$25.71$26.18$25.55$25.9225046-1.2%
32 (IBM)$138.64$139.23$131.15$132.67774593-6.1%
33 (Xerox)$15.96$16.13$15.00$15.73592052.4%
34 (GM)$72.10$77.87$71.39$77.7337105825.8%

Resulting Estimates

After testing, the learning machine returns the following estimated scores, EY.

EstimateScore
eyty
void-1.2%
-7.6%-6.1%
1.9%2.4%
4.9%5.8%

Ignoring the void element, we calculate the least squares error on the three "selected" individuals as 1.10% and we can also score these estimates in an alternate manner. We can sort these estimates, EY, and show that a sorting of the EY estimates also preserves the sort order of their corresponding TY values.

FAQ

Question 1: Why must we train the learning machine on collections of individuals in each time period? Why not simply train the machine on each individual separately and perform a normal regression estimate for each individual?

Answer: Many real world estimates cannot be made unless one is aware of the competitive land scape.

For instance, suppose one is estimating the social popularity of students in the senior high school class. We can perform any number of individual regressions correlating high scores for intelligence, appearance, and social skills with social popularity. However, all of these individual regression models are greatly skewed in the case where all students in the senior high school class are male except one student who is female.

Also, suppose one is estimating the financial popularity of our Bank's Certificates of Deposit. We perform any number of individual regressions correlating our Bank's previous Certificates of Deposit with their financial popularity. However, all of these individual regression models are greatly skewed in the case where one of our competitors is advertising an aggressive interest rate two percentage points higher than ours.

Question 2: Why must we allow the learning machine to select individuals in the testing time period? Why not simply have the machine provide normal regression estimates for each individual in the testing time period?

Answer: Many real world estimates cannot be made for all individuals; but, only for a few individuals.

For instance, suppose one is estimating currency conversion rates. Normally these rates have very small random daily changes. However, every so often, a central bank will pre-announce its intention to buy or sell its own currency. In those special cases the learning machine will want to have "no opinion" on most currencies; yet, make an estimate for the currency whose central bank has pre-announced.

Many real world situations do not allow us to accurately estimate the score of the best individuals; but, only to guess which might be the best individuals.

For instance, if ten monkeys and one five year old human child are taking a simple IQ test (this being all the information we have). We cannot, with the meager information provided, accurately estimate the IQ scores of the contestants after they take the IQ test. However, we can reasonably make the guess that the human child will have the best IQ score (whatever that score may be).

References

  1. Genetic Programming: On the Programming of Computers by Means of Natural Selection; John R. Koza; The MIT Press, Cambridge Massachusetts; 1992.
  2. Genetic Programming II: Automatic Discovery of Reusable Programs; John R Koza; The MIT Press, Cambridge Massachusetts; 1994.
  3. Genetic Programming III: Darwinian Invention and Problem Solving; John R Koza, Forrest H Bennett III, David Andre, Martin A Keane; Morgan Kaufmann Publishers, San Francisco, California; 1999.
  4. Genetic Programming IV: Routine Human-Competitive Machine Intelligence; John R Koza, Martin A Keane, Mathew J Streeter, William Mydlowec, Jessen Yu, Guido Lanza; Kluwer Academic Publishers, Dordrecht Netherlands; 2003.
  5. Grammatical Evolution; Michael O'Neill, Conor Ryan; Kluwer Academic Publishers, Dordrecht Netherlands; 2003.
  6. An Introduction to Genetic Algorithms; Melanie Mitchell; The MIT Press, Cambridge Massachusetts; 1996.
  7. Datamining using Grammar based Genetic Programming and Applications; Man Leung Wong, Kwong Sak Leung; Kluwer Academic Publishers, Dordrecht Netherlands; 2000.
  8. Genetic Algorithms and Genetic Programming in Computational Finance; edited by Shu-Heng Chen; Kluwer Academic Publishers, Dordrecht Netherlands; 2002.