A Train Automatic Stopping Controller (TASC) is an integral part of an Automatic Train Operation (ATO) system. The TASC performs automatic braking to stop a train at a predetermined range of positions. ATO systems are of particularly importance for train systems where train doors need to be aligned with platform doors, see the related Application, and Di Cairano et al., “Soft-landing control by control invariance and receding horizon control,” American Control Conference (ACC), pp. 784-789, 2014.
However, the transient performance of the train, i.e., the trajectory to the predetermined position, can be adversely affected by uncertainties in dynamic constraints used to model the train. These uncertainties can be attributed to the train mass, brake actuators time constants, and track friction. In many applications, estimating the uncertainties ahead of time (offline) is not possible due to numerous factors, such as expensive operational downtime, the time-consuming nature of the task, and the fact that certain parameters, such as mass and track friction, vary during operation of the train.
Therefore, the parameter estimation should be performed online (in real-time) and in a closed-loop, that is, while the ATO system operates. Major challenges for closed-loop estimation of dynamic systems include conflicting objectives of the control problem versus the parameter estimation, also called identification or learning, problem.
The control objective is to regulate a dynamic system behavior by rejecting the input and output disturbances, and to satisfy the dynamic system constraints. The identification objective is to determine the actual value of the dynamic system parameters, which is performed by comparing the actual behavior with the expected behavior of the dynamic system. That amounts to analyze how the system reacts to the disturbances.
Hence, the action of the control that cancels the effects of the disturbances makes the identification more difficult. On the other hand, letting the disturbances act uncontrolled to excite the dynamic system, which improve parameters estimation, makes a subsequent application of the control more difficult, because the disturbances may have significantly changed the behavior of the system from the desired behavior, and recovery may be impossible.
For instance, the TASC may compensate for the uncertain parameters such as friction and mass by actions of traction and brakes, so that the train stops precisely at the desired location regardless of the correct estimation of the train parameter. Thus, the dynamic system representing the train behaves closely to what expected and the estimation algorithm does not see major difference between the desired behavior and the actual behavior of the train. Hence, it is difficult for the estimation algorithm to estimate the unknown parameters. On the other hand even if the train behavior is close to the desired and the expected behaviors, this may be achieved by a large action of the TASC on brakes and traction, which results in unnecessary energy consumption, and jerk, which compromise ride quality.
On the other hand, letting the train dynamic system operate without control for some time may result in differences between the expected and actual behavior with subsequent good estimation, but when the control is re-engaged the train behavior may be too far from the desired one for the latter to be recovered, or it may cost an excessive amount of energy and jerk to recover.
Finally, in general there is no guarantee that the external disturbances cause enough effect on the train behavior to allow for correct estimation of the parameters, due to their random and uncontrolled nature. That is, it is not guaranteed that the external disturbances persistently excite the train system.
Therefore, it is desired to precisely stop the train within a predetermined range of positions, while estimating the actual train systems parameters to improve performance metrics, such as minimal jerk, energy, or time, by continuously updating the model in real-time. To this end, a system and method is needed for combined estimation and control that achieves:                (i) correct and fast estimation of the system parameters;        (ii) satisfaction of the system constraints including before parameters are correctly estimated; and        (iii) performance criterion optimization.        
To assure system parameters estimation, constraint satisfaction, and performance optimization, a model predictive control (MPC) with dual objective can be designed, see the related application Ser. No. 14/285,811, Genceli et al., “New approach to constrained predictive control with simultaneous model identification,” AIChE Journal, vol. 42, no. 10, pp. 2857-2868, 1996, Marafioti et al., “Persistently exciting model predictive control using FIR models,” International Conference Cybernetics and Informatics, no. 2009, pp. 1-10, 2010, Rathousk{grave over (y)} et al., “MPC-based approximate dual controller by information matrix maximization,” International Journal of Adaptive Control and Signal Processing, vol. 27, no. 11, pp. 974-999, 2013, Heirung et al., “An MPC approach to dual control,” 10th International Symposium on Dynamics and Control of Process Systems (DYCOPS), 2013, Heirung et al., “An adaptive model predictive dual controller,” Adaptation and Learning in Control and Signal Processing, vol. 11, no. 1, pp. 62-67, 2013, and Weiss et al., “Robust dual control MPC with guaranteed constraint satisfaction,” Proceedings of IEEE Conference on Decision and Control, Los Angeles, Calif., December 2014.
In part, the performance of the parameter estimation depends on whether the effect of external actions on the system is sufficiently visible, that is if the system is persistently excited and sufficient information is measured. Thus, for obtaining fast estimation of the system parameters, the action of the dual MPC is selected to trade off the system excitation and control objective optimization. To achieve such desired tradeoff between regulation and identification, an optimization cost function J can be expressed asJ=Jc+γψ(U),  (1)where J is a linear combination of the control-oriented cost Jc, ψ(U) is the residual uncertainty (or conversely the gained information) due to applying a sequence of inputs U, and γ is a weighting function of an estimation error that trades off between control and learning objectives. Optimizing cost function (1) subject to system constraints results in an active learning method in which the controller generates inputs to regulate the system, while exciting the system to measure information required for estimating the system parameters.
The weighting function should favor learning over regulation when the estimated value of the unknown parameters is unreliable. As more information is obtained and the estimated value of the unknown parameters becomes reliable, control should be favored over learning, by decreasing the value of function γ.
Possible definitions ψ(U), i.e., includeψ(U)=Ei=1Γtrace(Pi),  (2a)ψ(U)=−log det(RΓ),  (2b)ψ(U)=λmin(RΓ−R0), and  (2c)ψ(U)=Σi=1vexp(−Rii),  (2d)where P is unknown parameters covariance matrix, trace returns the sum of the elements on the main diagonal of P, R is an unknown parameters information matrix (R=P−1), Γ is a learning time horizon, v is the number of unknown parameters, and det and exp represent the determinant and exponent, respectively.
Unfortunately, all measures in (2a-2d) are non-convex in the decision variable U. This turns a conventional convex control problem into a non-convex nonlinear programming problem for which convergence to a global optimum cannot be guaranteed. Furthermore, the weighting function γ has a significant effect on the control input U. It is known that the reference generation problem can be converted to a convex problem. For example, Rathousk{grave over (y)} et al., use an approach based on conducting the reference generation optimization over a Γ-step learning time horizon, which includes Γ-1 previous input steps, and uses only a single step in the future.
Heirung et. Al., “An adaptive model predictive dual controller,” use Σi=1vexp(−Rii) as a measure of information about the system parameters. That function is used to augment the model predictive cost function. However, to avoid the problems introduced by the non-convexity of that information measure, the minimization of the term is considered over a 1-step learning time horizon. That method also provides the necessary condition for the weighting parameter γ to guarantee that the generated reference provides sufficient excitation to learn system parameters. The application of 1-step learning time horizon prevents optimization of the overall system performance, which requires in general a longer time horizon.
Another method provides an approximate solution for simultaneous estimation and control, based on dynamic programming for static linear systems with a quadratic cost function, see Lobo et al., “Policies for simultaneous estimation and optimization,” Proceedings of the American Control Conference, June 1999. While the approximate solution can improve the system performance, it cannot be easily applied to dynamic systems, such as ATO systems, and it requires significant computations, which may be too slow or may require too expensive hardware to be executed in ATO.