The invention described herein may be manufactured and used by or for the Government of the United States for all governmental purposes without the payment of any royalty.
The field of the invention relates to dynamic programming and more specifically to dynamic programming within a network structure.
Dynamic programming is a process that discovers an optimal trajectory toward a goal by deriving values of states encountered during exploration from values of succeeding states. The various forms of dynamic programming such as Q-learning, TD-Lamda, value iteration, and advantage learning often require extensive exploration and large amounts of memory for maintaining values for the vast number of states typically encountered in useful applications.
Function approximators, such as neural networks, can be used in the art to mitigate the memory liability associated with most forms of dynamic programming and afford good performance with the experience of just a small sample of all possible states. The dynamic programming network offers an alternative or collateral strategy by intelligently directing sensors toward regions of interest within a state, processing and retaining only information that contributes to the achievement of an objective. An experienced dynamic programming network may therefore considerably reduce the amount of exploration necessary to arrive at an optimal or good-enough solution.
Current methods for identifying military targets usually attempt to match a target with images stored within a database. Usually the database is quite large and contains real or synthetic images of the same target in as many orientations, configurations, and articulations as possible. Not all variations can be anticipated and insignificant variations can hinder finding a match. Military applications require considerably more time searching for a match than the duration of a mission and, because of the size of the database, processing cannot be performed aboard a tactical aircraft. The dynamic programming network of this invention conserves memory and processes image data with profound speed when implemented on the fine-grained parallel computers for which it was designed.
Sensor management, sensor fusion, and target recognition are seldom integrated well and are at best essentially independent software modules that only exchange data. The few modules known to adapt do so almost independently of the requirements of these other functions. Further, experts generally handcraft these functions so they are tailored to a specific environment, rendering the functions rigid in their application. The dynamic programming network integrates these functions seamlessly.
The present invention may be accurately described as a dynamic programming network. It cannot compare directly with known error-backpropagation neural networks because the error that back-propagates in such neural networks derives from a known desired response whereas a dynamic programming network must discover an unknown desired response after a lengthy trial and error search of states. The present invention allows for the possibility of a dynamic programming network using a function approximator to maintain the elements"" state values to learn to accept via its sensors an error or desired response.
The dynamic programming network conserves memory and processes image data with profound speed. The method of the invention is not rigid to a specific application but can be used in a wide variety of applications with minimal tailoring.
The dynamic programming network integrates sensor management, sensor fusion, and an application in a seamless structure in which these functions are mutually dependent and develop autonomously and concomitantly with experience. The dynamic programming network autonomously divides these functions into multiple subtasks that it can assign to the processors of a fine-grained parallel computer. As the number of processors available for these subtasks increases the network may attain its objective more efficiently. This architecture confers the greatest advantage in feature-rich applications such as identification of targets in synthetic aperture radar, visual, and infrared images. The design can be extended, however, to such diverse and general applications as control problems and machine intelligence. For the pattern recognition application described here, the dynamic programming network detects, selects, and identifies features and patterns comprising those features via a series of observations rather than processing all data available in each image, thereby minimizing sensor usage and volume of data processed. The network remembers similar features contained in many images instead of many images containing similar features, thus conserving memory and facilitating data retrieval.
It is therefore an object of the invention to provide an efficient, memory conserving dynamic programming system and method.
It is another object of the invention to provide an infinitely scalable dynamic programming network.
It is another object of the invention to provide a dynamic programming network that integrates sensor management, sensor fusion and an application in a seamless structure in which these functions are mutually dependent and develop autonomously and concomitantly with experience.
These and other objects of the invention are described in the description, claims and accompanying drawings and are achieved by an efficient, memory conserving, application integrating dynamic programming method comprising the steps of:
establishing a prototype element of a network, said establishing comprising the steps of:
assigning a table or function approximator for maintaining state values;
identifying a method for determining element state based on state values maintained from said assigning step;
applying a process for dynamically programming said element""s state values based on succeeding state values resulting from said element""s state from said identifying step;
connecting a plurality of elements from said establishing step to form a network;
coupling signal transmitting sensors to elements from said connecting step;
coupling elements from said connecting step to effectors;
maintaining within each element a running average of values for the state of an element in a cycle after such value occurs;
cycling said network by determining the state of all elements and sensors therein, selecting as each element""s state the highest running average value from said maintaining step;
sending an output signal to network effectors; and
presenting to said sensors a pattern based on a state that results from effector activity from said sending step.