Human action analysis based on motion pattern, often in the form of video images, play an important role in visual surveillance and video retrieval applications. Many applications could benefit from the detection and recognition of target actions using video images or specific semantic descriptions. Due to various factors such as large individual variations in clothes, accessories, appearance, scale, viewpoint, human body articulation and self-occlusion, and dynamic background clutter, motion pattern analysis remains a very challenging problem and has yet to achieve the accuracy of manual human analysis. In addition, one significant feature of manual human visual recognition which is difficult to automate, is that a human manually performing such recognition can be trained with very few examples, which is a highly desirable capability in motion pattern recognition.
With recent advancements in object recognition, computer analysis has seen significant progress in the object detection domain. However, the successful application of such schemes in 2D object recognition has not yet fully translated into interpreting motion patterns and performing action recognition in the 3D spatio-temporal domain. Lower performance is achieved for action recognition within the 3D domain as compared to the 2D domain.
In general, motion patterns appear distinctive and compact. Within a motion pattern, substantially different actions can often be easily separated (e.g. hitting a baseball v. shooting a basketball). However within the prior art, many actions usually share common sub-motions and sometimes substantial parts of motion patterns between different actions are quite similar with only subtle differences. However, local feature based approaches that would inspect subsets of features in select locations within a motion pattern become more susceptible to misalignments in feature locations and less optimal in differentiating motion pattern differences.
Accordingly, the prior lacks, a system and method for creating a classifier configured to utilize local features within a motion pattern as well as global features within the motion pattern to effectively identify the motion pattern.