US 12,169,793 B2
Approximate value iteration with complex returns by bounding
Robert Wright, Sherrill, NY (US); Lei Yu, Vestal, NY (US); and Steven Loscalzo, Vienna, VA (US)
Assigned to The Research Foundation for The State University of New York, Binghamton, NY (US)
Filed by The Research Foundation for The State University of New York, Binghamton, NY (US)
Filed on Nov. 16, 2020, as Appl. No. 17/099,762.
Application 17/099,762 is a continuation of application No. 15/359,122, filed on Nov. 22, 2016, granted, now 10,839,302, issued on Nov. 17, 2020.
Claims priority of provisional application 62/259,911, filed on Nov. 25, 2015.
Claims priority of provisional application 62/259,563, filed on Nov. 24, 2015.
Prior Publication US 2021/0150399 A1, May 20, 2021
Int. Cl. G06N 7/01 (2023.01); G05B 13/02 (2006.01); G05B 15/02 (2006.01); G06N 20/00 (2019.01)
CPC G06N 7/01 (2023.01) [G05B 13/0265 (2013.01); G05B 15/02 (2013.01); G06N 20/00 (2019.01); Y02B 10/30 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for controlling a system, comprising:
estimating an optimal control policy for the system;
receiving data representing sequential states and associated trajectories of the system, comprising off-policy states and associated off-policy trajectories;
improving the estimate of the optimal control policy by performing at least one approximate value iteration, each approximate value iteration comprising:
estimating an expected value of operation of the system dependent on the estimated optimal control policy;
using a complex return of the received data, biased by the off-policy states, to determine a bound dependent on at least the off-policy trajectories;
using the determined bound to improve the estimate of the expected value of operation of the system; and
updating the estimate of the optimal control policy, dependent on the improved estimate of the expected value of operation of the system using the determined bound; and
employing the updated estimate of the optimal control policy to control the system with an automated controller, wherein the automated controller is configured to automatically alter at least one of the system, and an environment in which the system operates.