Typical systems for determining the gestures of a person from one or more cameras typically utilize a camera mounted to a finger (e.g., on a ring) of the person or utilize a stereoscopic pair of cameras mounted on a gesture-based human interaction device to track hand motions. Such systems are localized to a single person and, as such, are unable to monitor an entire room in which any of multiple people in the room may be gesturing (e.g., pointing towards a surface). Other typical systems utilize a deformable three dimensional human model and a camera model that is based on carefully measured intrinsic parameters of the camera, to analyze a segmented human silhouette. Such systems are typically unable to compensate for slight environmental differences over time (e.g., lighting changes) or changes in the positioning or intrinsic parameters of the camera. Accordingly, such systems are prone to incorrectly determining the gestures of a person.