This invention relates generally to learning and inference systems, and more particularly to learning and inferring mappings between paths of varying dimensionality.
The field of regression (or regression analysis) deals with learning how to predict one thing from another. For example, regression can be used to predict air conditioner sales from daily temperatures.
A regression function maps points in a cue space (a domain) to points in a target space (a range). The spaces can have different but fixed dimensionalities, and the mapping can be quite complicated. Learning means discovering the proper form of the regression function and estimating its parameters. Regression literature dates back hundreds of years. Popular modern regression methods include parametric curve fitting, neural networks, and non-parametric methods such as radial basis function networks.
Often, there is not enough information in a single point to support a good mapping, but in sequential data nearby points often contain information about each other that can be used to complete the mapping. In this case, one may use a fixed-width context window, e.g., to predict today""s air conditioner sales from daily temperatures over the last week. This still is a point mapping, but the dimensionality of the domain has increased seven-fold.
There may always be useful information outside the context window. For example, it may help to know the temperature trend over the last few months. Then, one performs two regressions, one that combines the trend with today""s reading to make the prediction, and one that uses both inputs to update the trend. Note, both are still point mappings, but by updating the trend, information is carried forward in time. In short, the mapping incorporates foresight. Forward propagation models of this sort include Kalman filters, recurrent neural networks, and tap-delay control systems.
Finally, it may be desired to reconstruct a year""s air-conditioning sales from an entire year""s worth of temperature readings. Here one might want to use both foresight and hindsight (carrying information backward in time). For example, knowledge of August""s temperature readings may be useful in inferring June""s sales, because heavy use of many air conditioners can actually raise outdoor temperatures. Inferring one sequence from another is not a point mapping: The sequences have no fixed length so the dimensionality of the regression function is unknown.
Essentially, it is desired to map between paths in the domain and the range. A path is defined as an arbitrarily long sequence of points, where successive points are not entirely unrelated. In statistics, one says that the points are dependent. Kalman filtering (forward propagation) and smoothing (backward propagation) was formulated for this task. However, Kalman models have a serious limitation. The path in the. target space must evolve linearly, and its relationship to the path in the cue space must also be linear.
Extended Kalman filters attempt to relax the linear evolution constraint, but typically lack a smoothing step or a learning procedure. In addition, extended Kalman filters are generally based on linear approximations to a known non-linear evolution function. Linear approximations cause inference to fail to converge or converge incorrectly, in which case learning and prediction can fail.
Therefore, it is desired to provide a method that can infer a target path from a cue path. It is desired that this method should optimize use of hindsight and foresight. It is also desired that the method can handle targets that evolve in a non-linear manner. Furthermore, it is desired that the method handle non-linear relations between cue and target paths. Lastly, it is desired to provide fast learning and inference procedures that are guaranteed to converge.
Instead of a xe2x80x9ctrendxe2x80x9d variable as in the prior art, the invention uses a discrete vector-valued xe2x80x9chidden statexe2x80x9d variable which carries information both forwards and backwards in time. The learning procedure maximizes the information content of the hidden state variable at every time step.
An inference procedure finds globally optimal settings of the hidden state variable for every time step, carrying context from arbitrarily far away in time. Finally, the inferred path value for each time step incorporates information from the setting of the hidden state variable at every step in the time series. Altogether, the invention integrates information along the entire length of the paths, ensuring a globally consistent mapping between paths of varied, arbitrary dimensionality.
More particularly, the invention provides a method that infers a path in a target space from a path in a cue space. The inferred target path is maximally consistent with the cue path and with the known behavior of the target system, as learned from examples in a training phase. The learning procedure estimates a target state machine, target probability density functions (PDFs) and an occupancy matrix of the state machine from training target paths. The target state machine and the target PDFs comprise a hidden Markov model. The learning procedure minimizes entropy in these models.
Cue PDFs are estimated from training cue paths and the target occupancy matrix obtained from training. When mapping, a new cue path is analyzed using the cue PDFs and the target state machine to produce a new cue occupancy matrix. The new target path is synthesized from the cue occupancy matrix and the target PDFs.