1. Field of the Present Invention
The present invention relates to the field of computer vision and machine learning, more particularly, to a foreground action estimating apparatus and a foreground action estimating method.
2. Description of the Related Art
The recognizing of a three-dimensional posture of human body from a two-dimensional image is a heat subject in the field of computer vision and artificial intelligence, and this technique can be applied to various fields such as human-computer interaction, video monitoring, analysis and comprehension of digital information, and the like. However, this technique is also a challenge at present for the following reasons: (1) loss of depth information in a two-dimensional image causes indefiniteness of conclusion of three-dimensional information from the two-dimensional image, i.e. the three-dimensional information may include a plurality of possible solutions; (2) human images have many factors such as change in background, change in illumination, change in clothes, different visual angles, different postures, and the like, which greatly influence the conclusion of the three-dimensional posture; and (3) human posture is formed by combination and connection of a plurality of articulations, and a dimension of a posture space formed by the human posture is huge, hence it costs a large number of calculations to search for the optimum posture from the posture space.
From the point of technique principle, the method for estimating human posture from a single view image may be divided into a model-based method and a learning-based method. The model-based method is a method in which human model composed of each part of human body is constructed first, the process of posture estimating is the process of searching and matching the closest posture from a feature space using the model, and the process of searching is generally converted into a nonlinear optimization problem or a probability density estimating problem. Since the dimension of the posture space is huge, it is necessary to combine this method with tracking such that a good effect can be obtained. Therefore, a posture estimation effect mostly depends upon the initialization of the model before tracking, and in general, these methods also need to obtain the region of each part of human body in advance. The learning-based method is a method in which a three-dimensional posture of human body is directly concluded from the image feature. The image feature used frequently is human profile information, and motion analysis method, background modeling method or a combination thereof has been used for obtaining reliable profile information, however, it is difficult to separate human profile reliably by these methods in case of a complicated background. In addition, other features which have been already used may be trunk detection, complexion information, and the like.
At present, most methods depend upon image division or cluster, thus it is difficult to obtain a good effect in case of a complicated background. A. Agarwal has proposed a method of learning a foreground feature from an image feature, in which human posture feature is modeled using nonnegative matrix factorization so as to extract the foreground feature, the method has a higher flexibility in application since a step of image division is avoided, however, relative influence of a background feature and a foreground feature is not taken into consideration during background feature suppression in this method, hence, some part of background features would be also regarded as foreground features during feature reestablishion, thus influencing the background suppression effect.
Thus, the existing object recognition method and system generally require separating a foreground object from the background first, and it is difficult to obtain a good separation effect under a complicated background, hence a method capable of modeling a foreground feature and a background feature to achieve a better effect of background feature suppression is desired.