In many multiagent domains, agents must act in order to provide security against attacks by adversaries. A common issue that agents face in such security domains is uncertainty about the adversaries they may be facing. For example, a security robot may need to make a choice about which areas to patrol, and how often. However, it will not know in advance exactly where a robber will choose to strike. A team of unmanned aerial vehicles (“UAVs”) monitoring a region undergoing a humanitarian crisis may also need to choose a patrolling policy. They must make this decision without knowing in advance whether terrorists or other adversaries may be waiting to disrupt the mission at a given location. It may indeed be possible to model the motivations of types of adversaries the agent or agent team is likely to face in order to target these adversaries more closely. However, in both cases, the security robot or UAV team will not know exactly which kinds of adversaries may be active on any given day.
A common approach for choosing a policy for agents in such scenarios is to model the scenarios as Bayesian games. A Bayesian game is a game in which agents may belong to one or more types; the type of an agent determines its possible actions and payoffs. The distribution of adversary types that an agent will face may be known or inferred from historical data. Usually, these games are analyzed according to the solution concept of a Bayes-Nash equilibrium, an extension of the Nash equilibrium for Bayesian games. However, in many settings, a Nash or Bayes-Nash equilibrium is not an appropriate solution concept, since it assumes that the agents' strategies are chosen simultaneously.
In some settings, one player can commit to a strategy before the other players choose their strategies, and by doing so, attain a higher reward than if the strategies were chosen simultaneously. These scenarios are known as Stackelberg games. In a Stackelberg game, a leader commits to a strategy first, and then a follower (or group of followers) selfishly optimize their own rewards, considering the action chosen by the leader. For example, the security agent (leader) may first commit to a mixed strategy for patrolling various areas in order to be unpredictable to the robbers (followers). The robbers, after observing the pattern of patrols over time, can then choose their own strategy of choosing a location to rob.
To see the advantage of being the leader in a Stackelberg game, consider a simple game with the payoff table as shown in Table 1, infra. The leader is the row player and the follower is the column player. Here, the leader's payoff is listed first.
TABLE 1Payoff table for example normal form game.cda2, 14, 0b1, 03, 2
The only Nash equilibrium for this game is when the leader plays 2 and the follower plays 2 which gives the leader a payoff of 2.
However, if the leader commits to a uniform mixed strategy of playing 1 and 2 with equal (0.5) probability, the follower's best response is to play 3 to get an expected payoff of 5 (10 and 0 with equal probability). The leader's payoff would then be 4 (3 and with equal probability). In this case, the leader now has an incentive to deviate and choose a pure strategy of 2 (to get a payoff of 5). However, this would cause the follower to deviate to strategy 2 as well, resulting in the Nash equilibrium. Thus, by committing to a strategy that is observed by the follower, and by avoiding the temptation to deviate, the leader manages to obtain a reward higher than that of the best Nash equilibrium.
Such a Bayesian Stackelberg game may arise in a security domain because for example, when patrolling a region, a security robot may only have uncertain knowledge about different robber types it may face. The problem of choosing an optimal strategy for the leader to commit to in a Bayesian Stackelberg game is analyzed in and found to be NP-hard. This result explains the computational difficulties encountered in solving such games. In particular, methods for finding optimal strategies for non-Bayesian games can be applied to Bayesian Stackelberg games by converting the Bayesian game into a normal-form game by the Harsanyi transformation. However, by transforming the game, the compact structure of the Bayesian game is lost. In addition, methods such as the one outlined in require running a set of multiple linear programs, some of which may be infeasible. If on the other hand, one wishes to compute the highest-reward Nash equilibrium, new methods such as MW-Nash, using mixed-integer linear pro-grams (MILPs) may be used, since the highest-reward Bayes-Nash equilibrium is equivalent to the corresponding Nash equilibrium in the transformed game. However, as stated above the compactness in structure of the Bayesian game is lost. In addition, since the Nash equilibrium assumes a simultaneous choice of strategies, the advantages of being the leader are not considered.
The problem of choosing an optimal strategy for the leader to commit to in a Stackelberg game is analyzed in and found to be NP-hard in the case of a Bayesian game with multiple types of followers. Thus, efficient heuristic techniques for choosing high-reward strategies in these games is an important open issue. Methods for finding optimal leader strategies for non-Bayesian games can be applied to this problem by converting the Bayesian game into a normal-form game by the Harsanyi transformation. If, on the other hand, one wishes to compute the highest-reward Nash equilibrium, new methods using mixed-integer linear programs (MILPs) may be used, since the highest-reward Bayes-Nash equilibrium is equivalent to the corresponding Nash equilibrium in the transformed game. However, by transforming the game, the compact structure of the Bayesian game is lost. In addition, since the Nash equilibrium assumes a simultaneous choice of strategies, the advantages of being the leader are not considered.
Thus, finding more efficient and compact techniques for choosing the optimal strategies for the Bayesian Stackelberg games is an important open issue.
Bayesian Games
A Bayesian game contains a set of N agents, and each agent n must be one of a given set of types θn. For our patrolling domain, two agents may be present, the security agent and the robber. θ1 is the set of security agent types and θ2 is the set of robber types. Since there is only one type of security agent, θ1 contains only one element. During the game, the robber knows its type but the security agent does not know the robber's type. For each agent (the security agent or the robber) n, there is a set of strategies σn and a utility function un: θ1×θ2×σ1×σ2→
A Bayesian game can be transformed into a normal-form game using the Harsanyi transformation, as described in J. C. Harsanyi and R. Selten, “A generalized Nash solution for two-person bargaining games with incomplete information,” Management Science, 18(5):80-106, 1972; the entire contents of which are incorporated herein by reference. Once this is done, linear-program (LP)-based methods for finding high-reward strategies for normal-form games can be used to find a strategy in the transformed game; this strategy can then be used for the Bayesian game.
Given that the Harsanyi transformation is a standard concept in game theory, its key properties when applying to Bayesian Stackelberg games are briefly described without detailing the actual transformation. Using the Harsanyi technique involves introducing a chance hole, that determines the follower's type, thus transforming the leader's incomplete information regarding the follower into imperfect information. The resulting normal-form game matrix generated by the transformation contains the same set of possible actions for the leader as in the original game. However, the set of possible actions for the follower is the cross product of each follower type's set of possible actions. In other words, using the Harsanyi transformation on the Bayesian Stackelberg games, results are considered in a normal-form game with the same number of rows as there are leader actions; however the number of columns has increased exponentially, since every combination of actions taken by each follower type is considered as one possible action for the follower in the transformed game.
What is desirable, therefore, are devices and techniques that address such limitations described for the prior art.