The present invention generally relates to artificial intelligence and, more particularly, to a machine capable of developmental learning.
The capability of learning is critical to intelligence. A system without such learning capability generally cannot become more intelligent from experience. Information that is to be learned is generally received through sensors and the actions of an agent are executed by effectors. With respect to computers, rapid advances have been made in speed, storage capacity, performance-to-price ratio, and installation base, which have resulted in the widespread availability of computers. There now exists the possibility for developing reasonably priced multi-modal human-machine communication systems and multi-modal understanding machines. Breakthroughs in machine understanding of multi-modal information, such as video images, speech, language, and various forms of hand-written or printed text, can lead to numerous, long-awaited applications. Artificial intelligence machines have been developed with task-specific programming, which define the rules assigned to handle a particular task. For example, robots can be programmed to move from one location to another location in a specific section of a building. However, machines based on task-specific programming are generally unable to learn complex tasks and adapt to the changing environment.
The capability to understand what is sensed is a key for doing the right action in the right situation. Since humans acquire most of their knowledge from vision, we take vision as an example. It is well-known that vision is extremely difficult, especially for tasks such as recognizing objects in more general settings. For recognition of objects one must cope with a wide variety of variation factors, such as lighting, viewing angle, viewing distance, and object changes (e.g., facial expressions). It is known that learning plays a central role in the development of humans"" versatile visual capabilities and it takes place over a long period of time. Human vision appears to be more a process of learning and recalling than relying on an understanding of the physical processes of image formation and object-modeling. Furthermore, recognition by humans takes into account information sources that are not confined to vision. There is a particular need for integrating different sensing modalities for visual recognition. With humans, visual learning takes place while the recognizer is continuously sensing the visual world around it and interacting with the environment through human actions, such that a large amount of visual data that is processed along with other information is learned everyday.
The current mode of training a recognition system requires humans to manually prepare data and class labels to train the system offline. For vision recognition, system training may require a class label for each image. Known trained recognition systems are very limited in scope. For example, if such a system is simply trained to recognize an object as an apple, it cannot handle questions on whether it is a fruit, or whether it is round. Moreover, conventional offline batch training processes cannot produce a system that can continuously improve itself.
In developing an intelligent system, a task-specific paradigm has been used. Typical steps for task-specific systems can be characterized by the following: 1) start with a given task; 2) a human being attempts to analyze the task; 3) the human being derives a task space representation, which may depend on the tool chosen; 4) the human chooses a computational tool and maps the task space representation to the tool; and 5) the parameters of the tool are determined by using one or a combination of known methods. Such known methods include: a) knowledge-based methods that are manually specified using hand-crafted domain knowledge; b) behavior-based methods in subsumption architecture and active vision, c) supervised learning methods which provide estimates using a training procedure, unsupervised learning methods such as clustering techniques, reinforcement learning methods such as Q-learning; and d) search methods, such as genetic search. The known conventional methods are searched for based on a task-specific objective function. This paradigm starts with a task and the following steps depend on the task. Thus, it is referred to as a task-specific paradigm.
Various approaches within this task-specific paradigm have produced impressive results for those tasks whose space is relatively small and relatively clean (or exact), such as machine parts inspection applications in very controlled settings. However, the task-specific approaches face tremendous difficulties for tasks whose space is huge, vague, difficult to fully understand, and difficult to model adequately by hand, such as vision-based recognition of general objects, vision-based autonomous navigation by a mobile robot in unknown indoor and outdoor environments, human-computer interaction via vision, speech, gesture, and human-computer discourse via spoken or written language.
Due to the task-specific programming, conventional approaches are unable to provide a general-purpose learning capability that develops over time. The process of learning more skills based on learned skills is called developmental learning. A fundamental way to address these very challenging issues is to investigate how to automate the training process for a wide variety of cognitive and behavioral tasks, including recognition, information fusion, decision making and action generation. It is therefore desirable to realize intelligent systems with developmental learning that sense and act.
Accordingly, it is therefore an aspect of the present invention to provide for a machine and method that is capable of developmental learning from its environment without requiring task-specific programming. The machine receives various sensor inputs, organizes the information, and provides output control signals to effectors. The method is independent of the task to be executed and is, therefore, a general-purpose learner that learns while performing. The method is general in that virtually any sensors and effectors can be used for each machine, and potentially any cognitive and behavioral capability can be learned. Which sensors and effectors are used will affect the machine""s sensing and action capabilities. The machine can learn directly from the sensory input streams without requiring humans to segment input streams by continuously interacting with the environment, including interaction with a teacher. The system automatically builds multiple level representations using a generalized Markov random process model. Reward and punishment are also applied to the machine in the context of sensor-based teaching to develop intelligent behavior.
The machine includes one or more sensors for sensing an environment of the machine, one or more effectors for acting on one or more objects in the environment, a sensor-dedicated level builder having one or more level building elements, and a confidence accumulator. The machine and method of the present invention automatically develops learning capability by sensing an environment with the sensors, inputting successive frames of signal information into one or more sensor-edicated level builders, producing action signals with the sensor-dedicated level builders, each of the action signals having a relative probability. The method further includes inputting the action signals to the confidence accumulator, determining a most probable action based on the probability of the action signals received by the confidence accumulator, and producing action controlled signals to control the effectors in response to the determined action signals. The method advantageously learns while performing. According to a further embodiment of the present invention, the sensor-dedicated level builders produce state output signals which are integrated to generate integrated action signals that are input to the confidence accumulator. In addition, an average of the action signals could be computed and used to produce the action control signals. To conserve on memory, low priority action may be forgotten.
This invention enables developmental learning, including autonomous learning which is a special mode of developmental learning. The basic requirement of developmental learning is that the machine must be able to learn new tasks of unconstrained domains and new aspects of each complex task without a need for reprogramming (by humans). These new tasks that the system can learn are not confined to those imaginable at the time of machine construction. Therefore, the method of the machine must be so designed that it is not task-specific. Since the sensors and effectors of the machine are determined at the time of machine construction, the method is designed to fit the sensors and effectors of each particular machine. Thus, the method is sensor and effector specific.
These and other features, advantages and objects of the present invention will be further understood and appreciated by those skilled in the art by reference to the following specification, claims and appended drawings.