There are a wide variety of situations in which it is desirable to identify people, including people that are speaking, using systems that are, at least in part, automated. Some existing systems that identify speakers use audio—for example, they might use “sound source localization,” which includes processing the input from multiple microphones in different locations to attempt to identify the direction or directions from which speech originates. Some other systems attempt to improve the accuracy of methods like sound source localization by performing “decision level fusion,” where data from multiple inputs are combined at the point where decisions about person or speaker detection are made.