An operation of a dynamical system is usually described by a means of set of equations that specify a time dependence and evolution of the state of the system under the influence of control actions. At any given time, the dynamical system has a state given by a vector of real numbers, which can be represented in an appropriate state space. Small changes in the state of the dynamical system correspond to small changes in the numbers. Continuous dynamical systems usually operate according to a set of differential equations.
The invention is concerned with automatic control and scheduling of arbitrary non-linear dynamical systems in continuous state spaces with arbitrary transition functions that are controlled by a set of discrete control actions.
Example dynamical systems include robots, vehicles, heating ventilation air conditioning (HVAC) systems, power generators, and household appliances. Typically, the systems are operated by motors, which have a relatively small number of discrete settings, e.g., on and off, or the number of possible settings can be reasonably limited, e.g. setting a thermostat only at integer degrees.
The state of such systems is typically a real-valued vector x in a continuous state space X of the dynamical system. The control actions a of a set A are discrete. The dynamics of the control system can be described by the following set of equations:xk+1=ƒ(xk,ak),where xk is the state of the system at time tk, ak is the control action applied at time tk, ƒ is an arbitrary non-linear transition function, and the system evolves in discrete time such that tk=kΔt for a selected interval Δt. A sequence of actions a0, a1, a2, . . . must be selected such that a measure of performance is optimized. For example, an HVAC system can be optimized by bringing an environment to a desired temperature gradually, with minimum expenditure of energy.
One performance measure is a cumulative cost J over K steps:J=Σk=0Kg(xk,ak)+h(xK),where g is a selected operating cost, and h is a terminal cost associated with the final state xK.
Methods for solving this optimization problem for arbitrary functions ƒ, g, and h do not exist, only solutions for special cases are known. For example, in a linear quadratic regulator (LQR), a is real, f is linear, and g and h are quadratic in the state x and control a. However, in the general case, the function ƒ is not linear, and the cost functions g and h are not quadratic in the state and control. In such cases, the optimal control can be found by numerical methods.
Another method of describing the evolution of a dynamical system in time is to represent it as a Markov decision process (MDP). The MDP is described by a four-tuple (S, A, R, P), where S is a finite set of states s; A is a finite set of actions a; R is a reward function such that R(s, a) represents the reward (respectively, cost) if action a is taken in state s; and P is a Markovian transition model where P(s′|s, a) represents the probability of ending up in state s′ if action a is taken in state s.
Similarly to the case above, the goal is to find a sequence of actions a0, a1, a2, . . . , that optimize a performance measure that is defined in terms of the cumulative reward R(s, a). Methods for finding such an optimal sequence of actions exist for an arbitrary transition model P(s′|s, a).
However, a major distinction between an MDP and a set of differential equations that describes a continuous-state-space dynamical system is that the state space of an MDP is discrete, that is, the system can be only in a limited number of discrete states at any given time. It is thus desired to convert a given continuous-state-space dynamical system into a Markov decision process (MDP) with discrete state space, so that an optimal control sequence can be found for the MDP, and for the continuous-state-space system.