The present invention relates to an architecture, embodied either in hardware or a software agent or in a combination of hardware and software, adapted to implement self-biased conditioning, that is, an architecture adapted to modify or specialise its responses to particular inputs, dependent upon the circumstances, in the case where the architecture determines that the existing response is inapropriate.
Such an architecture is of interest for a number of different reasons. Firstly, by being able to modify existing responses to particular inputs, the architecture can adapt itself to changing circumstances and, thus, render its functions better adapted to the achievement of an operational goal. In other words, the architecture can form the basis of an autonomous agent having the ability to perform unsupervised learning. Secondly, an architecture implementing self-biased conditioning appears to a human user to be exhibiting xe2x80x9csocial intelligencexe2x80x9d. (In this document, xe2x80x9csocial intelligencexe2x80x9d means the set of skills or responses which enable an agent to interact appropriately with other agents for purposes including but not limited to the achievement of an operational goal.) Thus, the user will find that he can interact in a more intuitive fashion with the machine or programme embodying the architecture and this interaction will be experienced as more congenial.
Moreover, an architecture implementing self-biased conditioning is particularly well adapted to serve as the basis for new products such as xe2x80x9cintelligent interfacesxe2x80x9d and devices providing xe2x80x9csituated personal assistancexe2x80x9d (the personal assistance being xe2x80x9csituatedxe2x80x9d in the sense that this assistance is provided in an appropriate context).
SPA (situated personal assistance) devices can take a wide variety of forms. Consider, for example, the needs of a tourist visiting a museum such as the Louvre and having only a small amount of time. It would be useful for such a tourist to have at his disposal a device, personalised to know his tastes, capable of indicating items in the museum which are worth his attention. In theory such a device could be created by providing a computer device having access to a database storing details of the floor plan of the Louvre and its exhibits (as well as other museums and the like which the tourist may visit) and preprogramming the computer with rules indicating the user""s interests. However, such an approach requires a considerable amount of pre-programming and information gathering, both by the device""s designer and by the user. An SPA device having an initial criterion for identifying objects of potential interest to the user, and capable of interacting with the user to learn the user""s preferences, is much more interesting. The present invention enables such devices to be made.
The present invention is based on an analysis of social intelligence in animals and humans which leads to the identification of a set of rules or requirements which should be met by an architecture which seeks to give the appearance of social intelligence. However, it is not suggested that these rules or the architecture proposed according to the invention directly correspond to any particular structure or function of the human or animal brain.
Traditionally in the field of artificial intelligence (AI), the attempts that have been made to create structures emulating human cognitive processes have ignored the xe2x80x9csocialxe2x80x9d aspect of much of human behaviour. The typical attitude is reflected by the statements xe2x80x9csystems are social if their learning process is connected with its social surroundingxe2x80x9d and xe2x80x9cfor many purposes of cognitive simulation, it is of no special significance that thought is socialxe2x80x9d made in the article xe2x80x9cSituated Action: A Symbolic Interpretationxe2x80x9d by A. H. Vera and H. A. Simon, in the journal Cognitive Science, 17, (1993), pages 7-48. However, as Vera and Simon also recognise xe2x80x9cAll human behaviour is social. First and foremost, it is social because almost all the contents of memory, which provide half of the con text of behaviour, are acquired through social processesxe2x80x94processes of learning through instruction and social interaction.xe2x80x9d Moreover, the importance of the ability to interact appropriately with external agents is seen time and time again in the human and animal kingdom in such activities as foraging, mating, imitation, and expressions and experiences of emotion and sympathy.
Recently, some work in this sphere has been attempted, principally with regard to organisational strategy (xe2x80x9cRepresenting and Using Organizational Knowledge in DAI Systemsxe2x80x9d by L. Gasser, in xe2x80x9cDistributed Artificial Intelligence IIxe2x80x9d pages 55-78, ed. L. Gasser and M. N. Huhns, 1989, pub. Morgan Kaufmann) or study of group behaviour in the field of behaviour-based AI (xe2x80x9cBehaviour-based Artificial Intelligencexe2x80x9d by P. Maes, in xe2x80x9cFrom animals to animatsxe2x80x9d 2, pages 2-10, ed. J-A Meyer, H. L. Roitblat and S. W. Wilson, 1993, the MIT Press).
Considering collective or xe2x80x9csocialxe2x80x9d behaviour in the human and animal kingdom it can be postulated that such behaviour is based, at least in part, on a set of innate responses or behaviour patterns which are present from birth. For example, the mating behaviour of the fly Drosophila melanogaster appears to be genetically determined: the mutation of a gene controlling mating behaviour can cause reproductive isolation (xe2x80x9cIsolation of mating behaviour mutations in Drosophila melanogasterxe2x80x9d by E. Nitasaka in xe2x80x9cProceedings of the Third International Meeting of the Society for Molecular Biology and Evolutionxe2x80x9d, 1995, pages 63-64). Moreover, in humans it has been found that damage to a certain portion of the brain leads to defective social behaviour (xe2x80x9cDescartes Errorxe2x80x9d by A. R. Damasio, 1994, Avon Books).
It can be hypothesised on this basis that social behaviour has two innate requirements:
(1) the ability for agents to observe their own actions and those of other agents, and
(2) the possession of a set of primitive responsive behaviour routines.
A xe2x80x9csubsumptionxe2x80x9d architecture meeting the above requirement (2) has been proposed in the article xe2x80x9cA Robust Layered Control System For A Mobile Robotxe2x80x9d by R. A. Brooks, IEEE Journal of Robotics and Automation, RA-2(1), pages 14-23, 1986. Moreover, an architecture meeting the above requirements (1) and (2) has been proposed by the present inventor in xe2x80x9cPhase Transitions in Instigated Collective Decision Makingxe2x80x9d, in xe2x80x9cAdaptive Behaviourxe2x80x9d, 3(2), 1995, pages 185-223.
However, in general it is not possible to pre-program a set of routines or responses which will be appropriate in all the circumstances which a machine or computer program will encounter in practice. Moreover, in the human and animal kingdom genetic information does not provide behavioural responses appropriate for all circumstances which will be encountered during life.
Observation shows that in the human and animal kingdom there is an additional factor, namely the ability to specialise the innate primitive responses based upon experience and, in particular, based on interactions with others. A typical example of this ability is seen in the behaviour of young vervet monkeys. Initially, young vervet monkeys produce an alarm call when they see any flying birds, including ones which are harmless. However, they quickly learn to emit the alarm call only when flying predators are seen, much too quickly to be explained by the young monkey""s own experience. It has been postulated that the young monkeys learn to specialise their responses based on the responses of their older peers who ignore the xe2x80x9cfalse alarmsxe2x80x9d (see xe2x80x9cThe ontogeny of vervet monkey alarm calling behaviour: A preliminary reportxe2x80x9d by R. M. Seyfarth and D. L. Cheney, in xe2x80x9cz.
Tierpsychologyxe2x80x9d, 1980, 54, pages 37-56).
A third requirement for xe2x80x9csocial intelligencexe2x80x9d or appropriate social behaviour can thus be postulated:
(3) the ability to specialise responses originally triggered by primitive responsive behaviour, through interaction with others, and to remember the behavioural pattern as a secondary response.
The creation of a secondary response involves conditioning based upon inputs received from the outside. The generation of secondary responses represents conditioning of a system. Now, the ability to develop secondary responses based on sensory inputs in an unsupervised learning process involves the learning agent in an attempt to correlate a number of sensory inputs with a number of internal structures in an attempt to extend the knowledge base of the system. However, the computations involved in this correlation process are complicated.
A xe2x80x9cfocus of attentionxe2x80x9d method for unsupervised learning has been proposed in xe2x80x9cPaying Attention to What""s Important: Using Focus of Attention to Improve Unsupervised Learningxe2x80x9d by L. N. Foner and P. Maes, in xe2x80x9cFrom animals to animatsxe2x80x9d 3, pages 256-265, ed. D. Cliff, P. Husbands, J. A. Meyer, and S. W. Wilson, 1994, the MIT Press. This method seeks to make the correlation task manageable by focusing attention on a limited number of factors. The proposed method is based on cognitive selectivity and employs world-dependent, goal-independent and domain-independent strategies.
An associative control process (ACP) for conditioning has been proposed in xe2x80x9cModelling Nervous System Function with a Hierarchical Network of Control Systems that Learnxe2x80x9d by Klopf et al, in xe2x80x9cFrom animals to animatsxe2x80x9d 2, pages 254-261, ed. J-A Meyer, H. L. Roitblat and S. W. Wilson, 1993, the MIT Press). The proposed ACP network includes two kinds of learning mechanisms, drive-reinforcement learning (in reinforcement centres) and motor learning (in motor centres). However, the proposed system lacks an internal driving force inciting the network to undergo conditioning and has no mechanism for focusing attention whereby to reduce the complexity of the correlation processes inherent in conditioning.
Another conditioning system is proposed in xe2x80x9cNo Bad Dogs: Ethological Lessons for Learning in Hamsterdamxe2x80x9d by B. M. Todd and P. Maes, in xe2x80x9cFrom animals to animatsxe2x80x9d 4, pages 295-304, ed. P. Maes, M. Mataric, J. Meyer, J. Pollack and S. W. Wilson, 1996, the MIT Press. This method involves the use of pre-defined xe2x80x9cBehavioursxe2x80x9d which are arranged in Behaviour Groups so as to be mutually inhibiting. The Behaviour Groups in their turn are arranged in a loose hierarchical fashion. The resultant structure is very complex.
The article xe2x80x9cReinforcement leaning: A Surveyxe2x80x9d by L. P. Kaelbling, et al in the Journal of Artificial Intelligence Researchxe2x80x9d, 1996, 4, pages 237-285, discusses a process whereby an agent can learn behaviour through trial-and-error interactions with a dynamic environment. This is a type of unsupervised learning. In each of a succession of time periods, the agent receives a set of inputs as well as a reinforcement signal (a xe2x80x9crewardxe2x80x9d) and chooses an action. If the action is successful then the xe2x80x9crewardxe2x80x9d is allotted equally to all of the units which contributed to the choice of the action. The agent seeks to choose the action which increases the long-term sum of values of the reward, that is, of the reinforcement signal. It will be seen that, according to this proposal, the system conditions itself (adapts its responses to external stimuli) without external motivation, seeking merely to maximise an internal measure of the xe2x80x9csuccessxe2x80x9d of its action. However, such an approach does not take into account whether or not the conditioned response is xe2x80x9cappropriatexe2x80x9d or xe2x80x9csuccessfulxe2x80x9d from the point of view of external agents with which the system interacts.
In the opinion of the present inventor, an agent emulating xe2x80x9csocial intelligencexe2x80x9d not only should meet the above requirements (1) and (2) but also should meet a modified version of the third requirement, stated as follows:
(3xe2x80x2) the ability to specialise responses originally triggered by primitive responsive behaviour, by reacting to the presence or absence of some expected inputs from the outside (typically from others), and to remember the behavioural pattern as a secondary response.
More especially, the agent or architecture emulating xe2x80x9csocial intelligencexe2x80x9d should be adapted to expect a particular type of input in cases where its response is xe2x80x9cappropriatexe2x80x9d (or xe2x80x9csuccessfulxe2x80x9d or xe2x80x9cnormalxe2x80x9d) from the point of view of external agents with which it interacts and, depending upon whether or not the expected input is received, to specialise its responses to adapt them to the circumstances.
The present invention seeks to provide an architecture meeting the above requirements (1), (2) and (3xe2x80x2). By meeting these three requirements, the architecture according to the invention both is self-conditioning, that is it has an internal driving force promoting conditioning, and it selectively performs this conditioning based upon whether or not its existing responses are xe2x80x9cappropriatexe2x80x9d or xe2x80x9cnormalxe2x80x9d.
More particularly, the present invention provides a system implementing self-biased conditioning, comprising: a plurality of sensors; at least one actuator; at least one primary response network receiving an input signal from at least one first sensor and an input signal from at least one second sensor and generating an output signal for activating an actuator, wherein the primary response network comprises: an activation node receiving the input signal from said at least one first sensor and, in response to a first value of said input signal, outputting a trigger signal, at least one motor centre receiving the trigger signal from the activation node and adapted to respond to the trigger signal by generating said output signal for activating said actuator, means for applying positive and negative reinforcement signals to the motor centre whereby to promote or inhibit the response of the motor centre to the trigger signal from the activation node, and at least one expectation node receiving the input signal from the at least one second sensor, said input signal from the second sensor being indicative of whether or not the generation of said output signal for activating the actuator is appropriate, and for generating an output signal indicating when the generation of said output signal for activating the actuator is not appropriate; means for determining, based on an analysis of at least signals output by the expectation node and motor centre of the primary response network, that the response of the motor centre requires promotion or inhibition; and an associative memory generating said positive and negative reinforcement signals based upon the determination made by the determination means.
It could be considered that the present invention has a learning mechanism having similarities with the above-discussed ACP networks, but associated with an internal driving force and a xe2x80x9cfocus of attentionxe2x80x9d mechanism. Moreover, in the architecture according to the present invention there is a separation of xe2x80x9cinnatexe2x80x9d responses (primary response networks) from the xe2x80x9clearnablexe2x80x9d or xe2x80x9cconditionablexe2x80x9d part (the associative memory), which makes it possible to achieve an adaptive modular system. In effect, each primary response network is a building block of an adaptive system.
It could be considered that Brooks"" subsumption architecture is an example of modularization for constructing behaviour-based adaptive systems. However, the composition of one behavioural layer in terms of very fine-grained units, known as FSAs produces a problem in terms of functional decomposition. More particularly, a single behavioural layer is not allowed to change its functional character. By way of contrast, in the self-biased conditioning architecture of the present invention there are primary response networks as basic units of instinctive behaviour. The units themselves do not change in run-time but their functional character changes over time. The nature of self-biased conditioning allows the system to develop a secondary response, which partly corresponds to a functional decomposition, based on a single primary response network. Thus the self-biased conditioning network serves as a better basis for developing adaptive systems, particularly modular ones.
Other advantageous features of embodiments of the invention are recited in the dependent claims appended hereto.
Further features and advantages of the present invention will become clear from the following description of preferred embodiments thereof, given by way of example, illustrated by the accompanying drawings, in which: