1. Field of the Invention
The present invention relates to a behavior control apparatus, a behavior control method, and a program. Particularly, the present invention relates to a behavior control apparatus, a behavior control method, and a program that are suitable for controlling autonomous behavior.
2. Description of the Related Art
Machine learning of learning a control method to achieve a goal through trial-and-error by relying only on rewards from an environment is called “reinforcement learning” in a broad sense (e.g., see Nonpatent Document 1: “Reinforcement Learning”, written by Richard S. Sutton and Andrew G. Barto, translated into Japanese by Sadayoshi Mikami and Masaaki Minagawa, Dec. 20, 2000, First Edition, Published by Morikita Shuppan Co., Ltd.).
In a problem definition of the reinforcement learning, when a Markov process (the present state depends only on the next previous state) expressed by expression (1) is satisfied in a state space created from a measurement result of a sensor to measure an environment, a state value indicating an expectation value of future reward can be led from a Bellman's optimal equation expressed by expression (2). By selecting an action of the highest value, an optimal action can be taken.
[Expression 1]Pr{St+1=s′|St=s, at=a}  (1)
                              [                      Expression            ⁢                                                  ⁢            2                    ]                ⁢                                  ⁢                                            V              *                        ⁡                          (              s              )                                =                                    max              a                        ⁢                                          ∑                                  s                  ′                                            ⁢                                                P                                      ss                    ′                                    a                                ⁡                                  [                                                            R                                              ss                        ′                                            a                                        +                                          γ                      ⁢                                                                                          ⁢                                                                        V                          *                                                ⁡                                                  (                                                      S                            ′                                                    )                                                                                                      ]                                                                                        (        2        )            