A time-invariant linear dynamical system (LDS) models a process differentially with a time-indexed hidden state vector xt that evolves according to a system matrix A, input data ut, and a Gaussian noise source       γ    ⁢                   ∼          N      ⁡              (                  0          ,                      ∑            γ                          )              : xt+1←Axt+But+γt.
An synthetic signal yt is generated by an output matrix C, feed-through a matrix D, and a Gaussian noise source       ψ    ∼          N      ⁡              (                  0          ,                      ∑            ψ                          )              : yt←Cxt+Dut+ψt.
Geometrically, an LDS can be described as follows. The state vector x is a point in a high-dimensional space. The point makes a quasi-regular orbit around the origin, under the influence of the system matrix A, the input data, and noise. The output matrix C projects that point onto a space of output data.
The system matrix A defines a gradient in state space. If the system is stable, then the gradient from every point can be followed to an attractor, typically spiraling in to the origin, but also possibly falling into a limit cycle or a fixpoint. Therefore, noise and the input data play an important role in varying the behavior of the LDS and keeping the dynamics of the system from collapsing to extinction at the origin.
With a higher-dimensional state, the system matrix combines with noise to produce richer and more complicated orbits, because with more degrees of freedom, noise has a higher probability of moving the state away from a steady-state attractor. If just one channel or dimension of the state is plotted over time, then the simple cyclic nature of the LDS is quite apparent. However, if one channel is plotted against another, the resulting curve can be quite complex.
The channels of an LDS behave as a set of coupled harmonic oscillators. Each channel has its own resonant frequency but its phase is constantly adjusted via couplings to other channels. The system is forced by noise and the input data, and is typically self-damping. This is a good match to many natural quasi-periodic processes, for example, the semi-pendular motion of body limbs, the sway of tree branches and grasses in the wind, or many sound effects such as a crackling fire, babbling brook, babbling crowd babble, or rustling wind.
Remarkably, LDS-modeling also provides very good approximations of controlled motion such as speech articulation, turbulent motion of liquids and vapors, and even the Brownian agitation of particles and crowds. In those cases, the dynamics of the system are only approximately linear, but the LDS approximation can be improved by increasing the dimensionality of the LDS state vector.
Perhaps the most important observation about the widespread applicability of LDS is that regardless of how a natural phenomenon is recorded, e.g., as acoustic signals, images, motion parameters, etc., an LDS is a good model as long as the input data have a representation that highlights an oscillatory nature.
Of special interest are processes whose outputs can be observed as video, audio, or motion capture. The inputs to these processes are unknown, but can be assumed to have a quasi-periodic structure and can be modeled as part of the system. Thus, a simplified time-invariant LDS can be modeled by:xt+1←Axt+γt, andyt←Cxt+ψt.
LDS and equivalent models, e.g., Kalman filters and autoregressive moving averages, have been used extensively in many fields. The applicability of LDS to texture modeling has been noted anecdotally, but until recently, mathematical and computational hurdles have made it impossible to model high-dimensional linear dynamic systems.
Video modeling was first developed using ARMA models of small image patches by Szummer et al., “Temporal texture modeling,” IEEE International Conference on Image Processing, 1996. A good understanding of system identification methods needed for large-scale modeling was described by Overschee et al., “Subspace algorithms for the stochastic identification problem,” Automatica, 29(3):649-660, March 1993, and de Morr et al., “Applied and Computational Control, Signals and Circuits,” Chapter, ‘Numerical algorithms for subspace state space system identification—An Overview’, pages 247-311, Birkhauser Books, 1999, and more recently by Soatto et al. “Dynamic textures,” Intl. Conf. on Computer Vision, 2001. They described a no-inputs variant of a subspace method, see Doretto et al., “Dynamic data factorization,” Technical Report TR2001-0001, UCLA Computer Science, 2001.
Those varied methods share some limitations. They use weak finite-data approximations that lead to unnecessary reconstruction errors. They can be numerically unstable for short sequence of high-dimensional data, e.g., a video sequence. In practice, they can yield mismatched estimates of the dynamics and noise, which cause the state to collapse to zero within a few hundred frames of synthesis, producing meaningless output data.
In the field of computer graphics, the currently favored methods for texture synthesis work by montaging pieces of the original texture while minimizing error at the seams e.g., Wei et al., “Fast texture synthesis using tree-structure vector quantization,” Proc., SIGGRAPH 2000, pages 479-488, 2000, Efros et al., “Image quilting for texture synthesis and transfer,” Proc., SIGGRAPH 2001, 2001 and U.S. patent application Ser. No. 09/798,147, “Texture Synthesis and Transfer for Pixel Images, filed on Mar. 2, 2001 by Efros et al., and Hertzmann et al., “Image analogies,” Proc., SIGGRAPH 2001, 2001.
Other techniques for texture synthesis include a Markov chain (MC) method of Schodl et al., “Video textures,” Proc., SIGGRAPH 2000, pages 489-498, 2000, and a hidden Markov model (HMM) method of Brand et al., “Style machines,” SIGGRAPH 2000, 2000 and U.S. patent application Ser. No. 09/426,955, “Stylistically Variable State Space Models for Dynamic Systems, filed by Brand on Oct. 26, 1999. The MC method essentially shuffles the original sequence so that successive samples are similar. The outputs are the inputs, re-ordered but not necessarily dynamically related. Schodl et al. use random walks on the Markov chain to synthesize sequences of reasonably periodic or “loopy” phenomena, e.g., a child on a swing, or a candle flame.
The hidden Markov model defines hidden states roughly corresponding to subsequences, and an objective function giving the most likely observation sequence with respect to a hidden state sequence. Brand et al. perform a random walk on the hidden state and then solve the resulting objective function for the most dynamically and configurally probable sequence. They use this to synthesize motion-capture dance performances from random walks on the HMM. HMM modeling offers dynamical modeling and sample generativity, i.e., the synthetic samples are linear combinations of the original samples, but it runs the risk of synthesizing invalid configurations. Moreover, analysis, i.e., estimating the HMM, and synthesis are time consuming and data-intensive batch-mode operations.
Therefore, it is desired to provide a method which deals with a particularly hard modeling case, namely, the case where the behavior of the data-generating system itself is modulated by an unpredictable independent process, so that the outputs of the system are “random variations on a theme.” In addition, the data have hidden structure necessitating latent variable models, and the data are, of course, biased and noisy.
In addition, it is desired to provide a method that incorporates solution to the problems stated above. The method needs to yield efficient processes for determining robust, adaptable models for complex dynamical systems, and the models should allow synthesis and control of complex textured signals with high levels of detail.