The present invention relates generally to visual recognition systems and, more particularly, to a technique for differencing an image.
An interface to an automated information dispensing kiosk represents a computing paradigm that differs from the conventional desktop environment. That is, an interface to an automated information dispensing kiosk differs from the traditional Window, Icon, Mouse and Pointer (WIMP) interface in that such a kiosk typically must detect and communicate with one or more users in a public setting. An automated information dispensing kiosk therefore requires a public multi-user computer interface.
Prior attempts have been made to provide a public multi-user computer interface and/or the constituent elements thereof. For example, a proposed technique for sensing users is described in xe2x80x9cPfinder: Real-time Tracking of the Human Bodyxe2x80x9d, Christopher Wren, Ali Azarbayejani, Trevor Darrell, and Alex Pentland, IEEE 1996. This technique senses only a single user, and addresses only a constrained virtual world environment. Because the user is immersed in a virtual world, the context for the interaction is straight-forward, and simple vision and graphics techniques are employed. Sensing multiple users in an unconstrained real-world environment, and providing behavior-driven output in the context of that environment present more complex vision and graphics problems which are not addressed by this technique.
Another proposed technique is described in xe2x80x9cReal-time Self-calibrating Stereo Person Tracking Using 3-D Shape Estimation from Blob Featuresxe2x80x9d, Ali Azarbayejani and Alex Pentland, ICPR January 1996. The implementing system uses a self-calibrating blob stereo approach based on a Gaussian color blob model. The use of a Gaussian color blob model has a disadvantage of being inflexible. Also, the self-calibrating aspect of this system may be applicable to a desktop setting, where a single user can tolerate the delay associated with self-calibration. However, in an automated information dispensing kiosk setting some form of advance calibration would be preferable so as to allow a system to function immediately for each new user.
Other proposed techniques have been directed toward the detection of users in video sequences. The implementing systems are generally based on the detection of some type of human motion in a sequence of video images. These systems are considered viable because very few objects move exactly the way a human does. One such system addresses the special case where people are walking parallel to the image plane of a camera. In this scenario, the distinctive pendulum-like motion of human legs can be discerned by examining selected scan-lines in a sequence of video images. Unfortunately, this approach does not generalize well to arbitrary body motions and different camera angles.
Another system uses Fourier analysis to detect periodic body motions which correspond to certain human activities (e.g., walking or swimming). A small set of these activities can be recognized when a video sequence contains several instances of distinctive periodic body motions that are associated with these activities. However, many body motions, such as hand gestures, are non-periodic, and in practice, even periodic motions may not always be visible to identify the periodicity.
Another system uses action recognition to identify specific body motions such as sitting down, waving a hand, etc. In this approach, a set of models for the actions to be recognized are stored and an image sequence is filtered using the models to identify the specific body motions. The filtered image sequence is thresholded to determine whether a specific action has occurred or not. A drawback of this system is that a stored model for each action to be recognized is required. This approach also does not generalize well to the case of detecting arbitrary human body motions.
Recently, an expectation-maximization (EM) technique has been proposed to model pixel movement using simple affine flow models. In this technique, the optical flow of images is segmented into one or more independent rigid body motion models of individual body parts. However, for the human body, movement of one body part tends to be highly dependent on the movement of other body parts. Treating the parts independently leads to a loss in detection accuracy.
The above-described proposed techniques either do not allow users to be detected in a real-world environment in an efficient and reliable manner, or do not allow users to be detected without some form of clearly defined user-related motion. These shortcomings present significant obstacles to providing a fully functional public multi-user computer interface. Accordingly, it would be desirable to overcome these shortcomings and provide a technique for allowing a public multi-user computer interface to detect users.
The primary object of the present invention is to provide a technique for differencing an image.
The above-stated primary object, as well as other objects, features, and advantages, of the present invention will become readily apparent from the following detailed description which is to be read in conjunction with the appended drawings.
According to the present invention, a technique for differencing an image is provided. In a first embodiment, the technique can be realized by having a measuring device such as, for example, a digital camera, measure light values in a scene to generate a first image and a second image. The first and the second image can be, for example, a source image and a background image. Next, a processing device such as, for example, a digital computer, separates the first image into first luminance and chrominance components and the second image into second luminance and chrominance components. The processing device then determines a difference between the first and the second luminance components to generate a luminance mask, and determines a difference between the first and the second chrominance components to generate a chrominance mask. The processing device then combines the luminance mask and the chrominance mask two determine a difference between the first and the second images.
The first chrominance component may include a first color component and a second color component and the second chrominance component may include a third color component and a fourth color component. The processing device can then determine a difference between the first color component and the third color component to generate a first difference value, and determine a difference between the second color component and the fourth color component to generate a second difference value. The processing device can then combine the first difference value and the second difference value to generate the chrominence mask. The processing device may weight the first difference value differently than the second difference value, or vice versa, when combining the first difference value and the second difference value to generate the chrominence mask.
In a further aspect of the present invention, the processing device can combine the first difference value and the second difference value to generate a third difference value, and then threshold the third difference value to generate the chrominance mask. In such a case, the processing device may weight the first difference value differently than the second difference value, or vice versa, when combining the first difference value and the second difference value to generate the third difference value.
The processing device may always threshold the difference between the first and the second luminance components to generate the luminance mask, and the difference between the first and the second chrominance components to generate the chrominence mask.
In a second embodiment, the technique can be realized by having a processing device such as, for example, a digital computer, obtain a first representation of a plurality of first pixels representing a scene at a first point in time. The first representation can be, for example, a first electrical representation of an image of the scene that is captured by a camera at the first point in time and then digitized to form the plurality of first pixels. The first electrical representation can be stored, for example, as digital data on a tape, disk, or other memory device for manipulation by the processing device.
Each of the first pixels has a first pixel parameter such as, for example, a first particular range of light wavelengths, and a second pixel parameter such as, for example, a second particular range of light wavelengths. The first pixel parameter has a corresponding first value such as, for example, an intensity value of light over the first particular range of light wavelengths. The second pixel parameter has a corresponding second value such as, for example, an intensity value of light over the second particular range of light wavelengths.
The processing device determines a first difference value for each of the first pixels. The first difference value represents a difference between the first value of the first pixel parameter for each of the first pixels and a third value of the first pixel parameter for a corresponding one of a plurality of second pixels representing the scene at a second point in time. The plurality of second pixels may be a digitized version of an image of the scene that is captured by a camera at the second point in time. An electrical representation of the plurality of second pixels can also be stored on the same or another memory device for manipulation by the processing device.
The processing device also determines a second difference value for each of the first pixels. The second difference value represents a difference between the second value of the second pixel parameter for each of the first pixels and a fourth value of the second pixel parameter for a corresponding one of the plurality of second pixels representing the scene at the second point in time.
The processing device further determines a third difference value for each of the first pixels by combining the first difference value for a corresponding one of the plurality of first pixels and the second difference value for a corresponding one of the plurality of first pixels.
The processing device identifies a plurality of third pixels. Each of the third pixels is identified based upon a relation between the third difference value for a corresponding first pixel and a first threshold value. The relation between the third difference value for a corresponding first pixel and the first threshold value can be, for example, whether the third difference value for a corresponding first pixel exceeds the first threshold value.
The processing device produces a second representation of the plurality of third pixels. The second representation represents a difference in the scene at the first point in time as compared to the scene at the second point in time. The second representation can be, for example, a second electrical representation of a mask image that indicates the difference between corresponding pixels in the first and second plurality of pixels. The second electrical representation can also be stored on the same or another memory device for further manipulation by the processing device.
It should be noted that the first difference value for each of the first pixels can be weighted differently than the second difference value for a corresponding one of the first pixels, or vice versa, when combining the first difference value for each of the first pixels and the second difference value for a corresponding one of the first pixels to determine the third difference value for each of the first pixels.
In a third embodiment, the technique can be realized by having a processing device such as, for example, a digital computer, obtain a first representation of a plurality of first pixels representing a scene at a first point in time. The first representation can be, for example, a first electrical representation of an image of the scene that is captured by a camera at the first point in time and then digitized to form the plurality of first pixels. The first electrical representation can be stored, for example, as digital data on a tape, disk, or other memory device for manipulation by the processing device.
Each of the first pixels has a first pixel parameter such as, for example, a first particular range of light wavelengths, and a second pixel parameter such as, for example, a second particular range of light wavelengths. The first pixel parameter has a corresponding first value such as, for example, an intensity value of light over the first particular range of light wavelengths. The second pixel parameter has a corresponding second value such as, for example, an intensity value of light over the second particular range of light wavelengths.
The processing device determines a first difference value for each of the first pixels. The first difference value represents a difference between the first value of the first pixel parameter for each of the first pixels and a third value of the first pixel parameter for a corresponding one of a plurality of second pixels representing the scene at a second point in time. The plurality of second pixels may be a digitized version of an image of the scene that is captured by a camera at the second point in time. An electrical representation of the plurality of second pixels can also be stored on the same or another memory device for manipulation by the processing device.
The processing device also determines a second difference value for each of the first pixels. The second difference value represents a difference between the second value of the second pixel parameter for each of the first pixels and a fourth value of the second pixel parameter for a corresponding one of the plurality of second pixels representing the scene at the second point in time.
The processing device identifies a plurality of third pixels. Each of the third pixels is identified based upon a relation between the first difference value for a corresponding first pixel and a first threshold value. The first threshold value can be, for example a preselected intensity value of light over the first particular range of light wavelengths. The relation between the first difference value for a corresponding first pixel and the first threshold value can be, for example, whether the first difference value for a corresponding first pixel exceeds the first threshold value.
The processing device also identifies a plurality of fourth pixels. Each of the fourth pixels is identified based upon a relation between the second difference value for a corresponding first pixel and a second threshold value. The second threshold value can be, for example, a preselected intensity value of light over the second particular range of light wavelengths. The relation between the second difference value for a corresponding first pixel and the second threshold value can be, for example, whether the second difference value for a corresponding first pixel exceeds the second threshold value.
The processing device produces a second representation by combining the plurality of third pixels and the plurality of fourth pixels. The second representation represents a difference in the scene at the first point in time as compared to the scene at the second point in time. The second representation can be, for example, a second electrical representation of a mask image that indicates the difference between corresponding pixels in the first and second plurality of pixels. The second electrical representation can also be stored on the same or another memory device for further manipulation by the processing device.
The processing device can also produce a third representation of the plurality of third pixels and a fourth representation of the plurality of fourth pixels. The third representation can be, for example, a third electrical representation of a mask image that indicates a difference in the first pixel parameter between corresponding pixels in the first and second plurality of pixels. The third electrical representation can also be stored on the same or another memory device for further manipulation by the processing device. Similarly, the fourth representation can be, for example, a fourth electrical representation of a mask image that indicates a difference in the second pixel parameter between corresponding pixels in the first and second plurality of pixels. The fourth electrical representation can also be stored on the same or another memory device for further manipulation by the processing device.
The processing device can thereby produce the second representation by combining the third representation and the fourth representation. As before, the second representation represents a difference in the scene at the first point in time as compared to the scene at the second point in time.
The processing device can identify a plurality of fifth pixels. Each of the fifth pixels can be identified by combining a corresponding one of the plurality of third pixels and a corresponding one of the plurality of fourth pixels.
The processing device can thereby produce the second representation from the plurality of fifth pixels. As before, the second representation represents a difference in the scene at the first point in time as compared to the scene at the second point in time.
The first and the second pixel parameters can be, for example, luminance and a single color channel, respectively. The single color channel can be, for example, a blue light color channel or a red light color channel.
Each of the first pixels can have a third pixel parameter such as a third particular range of light wavelengths. The third pixel parameter can thereby have a corresponding fifth value such as an intensity value of light over the third particular range of light wavelengths.
The processing device can thereby determine a third difference value for each of the first pixels. The third difference value can represent a difference between the fifth value of the third pixel parameter for each of the first pixels and a sixth value of the third pixel parameter for a corresponding one of the plurality of second pixels representing the scene at the second point in time.
The processing device can thereby determine a fourth difference value for each of the first pixels by combining the second difference value for a corresponding one of the first pixels and the third difference value for a corresponding one of the plurality of first pixels.
The processing device can thereby identify a plurality of fifth pixels. Each of the fifth pixels can be identified based upon a relation between the fourth difference value for a corresponding first pixel and a third threshold value. The relation between the fourth difference value for a corresponding first pixel and the third threshold value can be, for example, whether the fourth difference value for a corresponding first pixel exceeds the third threshold value.
The processing device can thereby produce the second representation by combining the plurality of third pixels and the plurality of fifth pixels. As before, the second representation represents a difference in the scene at the first point in time as compared to the scene at the second point in time.
The processing device can also produce a third representation of the plurality of third pixels and a fourth representation of the plurality of fifth pixels. The third representation can be, for example, a third electrical representation of a mask image that indicates a difference in the first pixel parameter between corresponding pixels in the first and second plurality of pixels. The third electrical representation can also be stored on the same or another memory device for further manipulation by the processing device. The fourth representation can be, for example, a fourth electrical representation of a mask image that indicates a difference in the second and third pixel parameters between corresponding pixels in the first and second plurality of pixels. The fourth electrical representation can also be stored on the same or another memory device for further manipulation by the processing device.
The processing device can thereby produce the second representation by combining the third representation and the fourth representation. As before, the second representation represents a difference in the scene at the first point in time as compared to the scene at the second point in time.
The processing device can identify a plurality of sixth pixels. Each of the sixth pixels can be identified by combining a corresponding one of the plurality of third pixels with a corresponding one of the plurality of fifth pixels.
The processing device can thereby produce the second representation from the plurality of fifth pixels. As before, the second representation represents a difference in the scene at the first point in time as compared to the scene at the second point in time.
It should be noted that the second difference value for each of the first pixels can be weighted differently than the third difference value for a corresponding one of the first pixels, or vice versa, when combining the second difference value for each of the first pixels and the third difference value for a corresponding one of the first pixels to determine the fourth difference value for each of the first pixels.