1. Field of the Invention
The present invention relates to an information-processing apparatus, an information processing method adopted by the information-processing apparatus and a computer program implementing the information processing method. To put it in detail, the present invention relates to an information-processing apparatus for automatically constructing an FPOMDP (Factored Partially Observable Markov Decision Process) by taking a POMDP (Partially Observable Markov Decision Process) as a basic configuration, relates to an information processing method adopted by the information-processing apparatus and relates to a computer program implementing the information processing method.
2. Description of the Related Art
As one of state-prediction/action-determination methods, a technique of processing applying a POMDP is known. An outline of the POMDP is explained as follows.
The POMDP is carried out as a process applying the following pieces of information:    (a): State space: S    (b): Action space: A    (c): State-transition probability computation function for computing a probability P of a transition from a state S existing at a time T=(t−1) to a state S existing at a next time T=t: T (st, at−1, st−1)=P (st|at−1, st−1).
In this case, symbol st denotes the state S existing at the time T=t, symbol st−1 denotes the state S existing at the time T=(t−1) and symbol at−1 denotes an action A taken at the time T=(t−1). It is thus obvious that the probability P is computed from st−1 representing the state S existing at, the time T=(t−1) and at−1 representing the action A taken at the time T=(t−1).    (d): Reward function for computing a reward from st representing the state S existing at the time T=t and at representing the action A taken at the time T=t: R (st, at).    (e): Observation space: Ω    (f): Observation-generation probability computation function for computing a probability P of generation of an observation state at the time T=t: O (st, at−1, ot−1)=P (ot|at−1, st).
In this case, symbol ot denotes an observation generated, at the time T=t. It is thus obvious that the probability P is computed from st representing the state S existing at the time T=t and at−1 representing the action A taken at the time T=(t−1).
The POMDP is a process for predicting a state and determining an action by application of the pieces of described above. For example, the POMDP is applicable to processing to determine an action considered to be optimum from little information that can be acquired. To put it concretely, the POMDP can be applied to various kinds of action determination processing such as processing to determine an action to be taken by a robot, a simulation making use of a computer, data processing and processing to determine an optimum action in business management.
By referring to FIG. 1, the following description explains processing based on the POMDP as processing fox predicting a state and determining an action by application of the pieces of information described above. FIG. 1 is a diagram, showing st−1 representing a state S existing at the time T=(t−1), at−1 representing an action A taken at the time T=(t−1), Rt−1 representing a reward R given at the time T=(t−1) and ot−1 representing an observation generated at the time T=(t−1) as well as st representing a state S existing at the time T=t, at representing an action A taken at the time T=t, Rt representing a reward R given at the time T=t and ot representing an observation generated at the time T=t following the time T=(t−1). Every arrow originating from a block serving as a parent and pointing to a block serving as a child indicates that the information and state of the parent have an effect on the information and state of the child.
For example, Rt−1 representing a reward R given at the time T=(t−1) is found as the value of the reward function R (st−1, at−1) from st−1 representing the state S existing at the time T=(t−1) and at−1 representing the action A taken at the time T=(t−1).
ot−1 representing an observation generated at the time T=(t−1) is typically observable information varying with a change in st−1 representing the state S existing at the time T=(t−1).
Each of rewards given at other times such as t, (t+1) and so on is found by making use of the same relation as the reward given at the time T=(t−1). By the same token, each of observations generated at the other times is found by making use of the same relation as the observation generated at the time T=(t−1).
The state-transition probability computation function T (st, at−1, st−1)=P (st|at−1, st−1) given above is a relation between quantities of different times. To be more specific, the state-transition probability computation function T (st, at−1, st−1)=P (st|at−1, st−1) given above is a relation between st representing the state S existing at the time T=t and st−1 representing the state S existing at the time T=(t−1) as well as at−1 representing the action A taken at the time T=(t−1). That is to say, the probability that the observation st−1 exists at the time T=t is found from st−1 representing the state S existing at the time T=(t−1) and at−1 representing the action A taken at the time T=(t−1). This relation holds true as a relation between quantities of any consecutive event observation times.
As described above, in the POMDP, in an observation domain including uncertainty, a variety of information spaces such as the state space, the action space, the reward space and the observation space are defined. Then, on the basis of the relationships between these information spaces, a state transition in the observation domain including uncertainty is predicted and an action of its own in the observation domain is determined. As typical processing to determine an action, for example, a best action with a highest computed reward is determined.
It is to be noted that a process to correctly set the relationships between the information spaces such as the state space, the action space, the reward space and the observation space is of importance to process construction processing based on the POMDP. In the process to correctly set the relationships between the information spaces, a BN (Bayesian Network) is used. The BN is a network including a plurality of nodes, relationships between which are defined. Processing to generate a and processing to make use of a BN are described in documents such as US Patent No. 2004/0220892 and US Patent No. 2002/0103793, which explain processing to generate a highly reliable BN correctly defining relationships between nodes included therein.
In the POMDP explained above by referring to FIG. 1, the information spaces applied to the processing to determine an action as information spaces for each time are each processed as information space including only one element. In this case, the information spaces for each time are the state space, the reward space and the observation space, In the actual environment, on the other hand, a state space obtainable as information and/or an observation space obtainable as information each include a variety of elements (or factors) different from each other in many cases. In case of the traditional POMDP, a configuration for automatically constructing a POMDP taking these elements (or factors) different from each other into consideration is not implemented.