Machine vision systems and methods are frequently used to detect people, objects or activities from imaging data that typically includes still or moving images as well as other information, data or metadata. Such systems and methods are commonly provided in environments where an ever-changing variety of people may be present, and where any number of actions may be occurring. In particular, machine vision systems and methods are commonly applied in industrial or commercial environments for the purpose of detecting and classifying human actions and activities. Such systems and methods may operate by detecting and recognizing a person within one or more environments, tracking movements of the person's arms, legs, head, torso or other body parts, and classifying an action that was performed by the person based on his or her tracked movements.
The detection and classification of human actions and activities from imaging data by machine vision systems may be complicated, however, by one or more intrinsic or extrinsic factors. For example, machine vision systems typically attempt to recognize actions or activities involving humans by recognizing the movement of limbs or other body parts in a particular fashion, e.g., a particular gait or other type or form of rhythmic or arrhythmic motion. Therefore, in order to recognize an action or activity from a set of imaging data, e.g., one or more still or moving images, such systems and methods must first identify a human within the set of imaging data, and determine whether the human is engaged in an action or an activity, before classifying the action or activity based on his or her motion. Where a number of imaging devices are provided in one or more scenes of an environment for the purpose of observing actions or activities occurring therein, however, the variations in the conditions of each of the scenes, or the orientations or configurations of the respective imaging devices provided therein, may lead to erratic or inconsistent results. Next, the various portions of the imaging data (e.g., digital images, or clips of digital video data) captured from a given imaging device may fail to cover or include each of the elements associated with a given action, and thus provide an incomplete or unreliable prediction as to the action observed therein.
Moreover, the accuracy or precision with which a machine vision system detects and classifies a human action or activity may be hindered based on the inherently unique characteristics of the human body. For example, no two humans are exactly alike, and each human may perform the same actions or activities in vastly different ways. Therefore, identifying the performance of an action or an activity by different humans typically requires individualized analyses of the respective motions of the respective humans, which frequently requires an extensive amount of processing power, network bandwidth or data storage capacity. Similarly, a single person may perform two or more different tasks using remarkably similar motions of his or her limbs or other body parts. Distinguishing between the discrete tasks in view of such similar motions may further occupy substantial portions of the available power, bandwidth or storage capacity of a computer system, as well.