In the general field of predicting decision-making behavior, there are a number of approaches taught in the prior art. Each approach has deficiencies and problems that limit the effectiveness or practicality of those prior art approaches. In this section two general areas related to predicting decision-making behavior are discussed: sequential probabilistic modeling, and efficient goal-based modeling.
The goal of modeling decision-making behavior is to accurately predict the sequential behavior and decisions an agent would choose—e.g., the motions a person would take to grasp an object or the route a driver would take to get from home to work. One common approach to this problem is to directly model the conditional probability of actions based on the situation. In the navigation domain, for example, the situation could be the intersection where the driver is currently located and the driver's intended destination. A Markov model constructed for this setting then predicts the driver's next action (i.e., the road he or she will next travel on) based on the empirical frequencies of what has historically happened in the same situation. A major limitation of this approach is that there is often limited or no historical experience with the exact same situation, so the empirical frequencies are poor estimates of future behavior. Related are memory-based approaches that try to combine previously executed routes to create a suitable route between two different locations. These suffer from the same limitations.
A large number of systems have been developed that employ this approach for predicting decision-making In “an Information-theoretic framework for personal mobility tracking in pcs networks”, Wireless Networks (2002), Bhattacharya and Das used Markov predictors to develop a probability distribution of transitions from one GSM cell to another. Ashbrook and Starner, in “Using GPS to learn significant locations and prediction movement”, Personal and Ubiquitous Computing (2003), take a similar approach to develop a distribution of transitions between learned significant locations from GPS data. Patterson et al. in “inferring high-level behavior from low-level sensors”, Proc. Ubicomp (2003), used a Bayesian model of a mobile user conditioned on the mode of transportation to predict the user's future location. Mayrhofer extends this existing work in location prediction to that of context prediction, or predicting what situation a user will be in, as discussed in “an architecture for context prediction”, PhD thesis of Johannes Kepler University of Linz (2004), by developing a supporting general architecture that uses a neural gas approach. In “the Neural Network House: An environment that adapts to its inhabitants”, AAAI Spring Symposium (1998), a smart home observed the behaviors and paths of an occupant and learned to anticipate his needs with respect to temperature and lighting control. The occupant was tracked by motion detectors and a neural network was used to predict the next room the person was going to enter along with the time at which he would enter and leave the home. Liao et al. in “Learning and inferring transportation routines”, Artificial Intelligence (2007), model transportation decisions using a directed graphical model, a probabilistic framework for representing these conditional probabilities.
An alternate view of modeling decision-making behavior is that the decision-maker is modeled as choosing actions by solving a sequential decision-making or planning problem. In this view, the model need only estimate the parameters of that decision-making problem to predict behavior.
In the inverse optimal control setting (also variously referred to as inverse reinforcement learning, imitation learning, and apprenticeship learning in the field), an agent's behavior (i.e., its trajectory or path, ζ, of states si and actions αi) in some planning space is observed by a learner trying to model or imitate the agent. The agent is assumed to be attempting to optimize some function that maps (by, e.g. a linear function) the features of each state, fsjεto a state reward value representing the agent's utility for visiting that state. This function is parameterized by some reward weights, θ. The reward value of a trajectory is simply the sum of state rewards, or, equivalently, the reward weight applied to the path feature counts,
      f    ζ    =            ∑                        s          j                ∈        ζ              ⁢          f              s        j            which are the sum of the state features along the path as shown in Equation (i).
                              reward          ⁡                      (                          f              ζ                        )                          =                                            θ              T                        ⁢                          f              ζ                                =                                    ∑                                                s                  j                                ∈                ζ                                      ⁢                                          θ                T                            ⁢                              f                                  s                  j                                                                                        Equation        ⁢                                  ⁢                  (          i          )                    
The agent demonstrates trajectories, {tilde over (ζ)}i, and each has an empirical feature count,
      f    ~    =            1      m        ⁢                  ∑        i            ⁢              f                              ζ            ~                    i                    based on many (m) demonstrated trajectories.
Abbeel & Ng (2004) approach the Inverse Optimal Control problem in “Apprenticeship learning via inverse reinforcement learning”, Proc. ICML, by matching feature expectations (Equation ii) between an observed policy and a learner's behavior; they demonstrate that this matching is sufficient to achieve the same performance as the agent if the agent were in fact solving a decision-making problem with a reward function linear in those features.
                                          ∑                          Path              ⁢                                                          ⁢                              ζ                i                                                                                    ⁢                                          ⁢                                    P              ⁡                              (                                  ζ                  i                                )                                      ⁢                          f                              ζ                i                                                    =                  f          ~                                    Equation        ⁢                                  ⁢                  (          ii          )                    
Recovering the agent's exact reward weights is an ill-posed problem; many reward weights, including degeneracies make demonstrated trajectories optimal.
The main limitation of the inverse optimal control approach is that it is designed for prescribing or controlling behavior rather than predicting what behavior will actually occur. In fact, it is common for the actual behavior that occurs to have zero probability of occurring within an inverse optimal control model of behavior.
A specialized approach has been developed for navigation based on the notion of efficiency. The PreDestination model (Krumm & Horvitz 2006) in “Predestination: inferring destinations from partial trajectories”, Proc. Ubicomp, does not model the specific actions of a driver in trying to reach a destination, but instead discretizes the world into a grid of cell. The probability of the driver having an intended destination within a particular cell is based on the efficiency of the driver's route so far in traveling from the driver's starting location to that potential destination cell location. If the driver's current location is a very inefficient intermediate cell to travel through to reach the potential destination cell, that destination cell will have a low probability in their model. This approach is limited in its predictive abilities to only being able to predict the destination of the driver and not being able to predict the path the driver will take to reach that destination.
Accordingly, there is a need for improved systems and methods for modeling navigational behavior, including prediction of future routes and recommendations of new routes. Those and other advantages of the present invention will be described in more detail hereinbelow.