Electronic systems exist for capturing the three-dimensional range image of the environment. For example, stereovision, structured light methods, and time-of-flight methods are some examples of these systems.
There are object based compression systems in the literature as well. For instance, MPEG4 is a protocol that permits the use of various compression schemes for different objects in the scene.
There are also various methods that segment the objects in an image. These methods are intended for two-dimensional images. For example, U.S. Pat. No. 6,404,920, features the use of the fundamental concept of color perception and multi-level resolution to perform scene segmentation and object/feature extraction in the context of self-determining and self-calibration modes. The technique uses only a single image, to do object segmentation, and then matches the segmented image with libraries of objects in a system database. This method is intended for two-dimensional image based segmentation.
U.S. Pat. No. 6,404,920 describes a three-dimensional reference image segmenting method and device, where a two-dimensional image of a reference object and a shape data of a pattern obtained by transforming the image are stored together with depth data of the reference object in a memory as a reference pattern. On the basis of a local Fourier transform image data of an input image supplied from an image transform unit and the reference data of the reference pattern read out from the memory, a deform amount estimating unit calculates the amount of deformation (displacement vector) required to make both the images coincident with each other to a possible extent. This method also utilizes two-dimensional image for segmentation.
There have been many other attempts to apply two-dimensional imaging to object segmentation. In U.S. Pat. No. 6,532,302, a sample image is segmented by an image segmentation system including a size reduction unit, which reduces the size of the image, and, at the same time, fills small gaps between foreground pixels. Thereafter, a connected component analyzer identifies connected components and their associated minimum bounding rectangles in the reduced image. Next, a target object filter searches the connected components for target objects, making use of a target object library to identify target objects characterized by such parameters as size, shape, and texture. U.S. Pat. No. 6,389,163 provides a method and apparatus for automatic image segmentation using template-matching filters. The invention generally segments differing binary textures or structures within an input image by passing one or more structures while removing other structures.
There have also been attempts to segment the objects in videos of two-dimensional images. U.S. Pat. No. 6,526,169 describes a histogram-based segmentation of an image, where frame or picture of a video signal is separated into objects via color moments. A defined area is characterized by its color information in the form of a limited set of color moments representing a color histogram for the area. Based upon the set of color moments, objects that belong to various parts of the histogram are identified.
There has been some prior art regarding the use of depth information for increasing the quality and efficiency of compression algorithms. For instance, in “Stereo Imaging in Low Bitrate Video Coding,” by M. A. H. Venter et al., published in COMSIG 1989—Proceedings South Africa Conference [of] Communication Signal Processing, pp. 115–118 (IEEE Jun. 23, 1989), video compression techniques that use a stereo imaging depth data are described. In these techniques, Venter et al. generates a “motion vector” from the depth data as a check on the accuracy of motion vectors, which are generated in a compression-coding algorithm. Venter et al. also proposes to model the three-dimensional shape of the moving object for further reference check.
In another method that is disclosed in “Low Bitrate Video Coding with Depth Compensation,” by J. J. D. van Schalkwyk et al., published in IEEE Proceedings: Vision, Image and Signal Processing, Vol. 141, No. 3, pp. 149–53 (1994), the depth information from a stereo camera system is used to separate the moving foreground objects from the static background. Then, the motion vectors are generated by comparing the object's three-dimensional position in the current and the previous frames. These motion vectors are then used in the compression process to generate a more accurate representation of the scene.
In another method described in “Image Sequence Coding Using 3D Scene Models,” by Bernd Girod, published in the Proceedings of SPIE—The International Society for Optical Engineering, Vol. 2308, pp. 1576–1591 (SPIE 1994), a depth information is first captured to construct a shape model of a person's head. In the video compression mode, a regular two-dimensional camera is used and the shape model is matched to the image and the shape parameters such as the translation, rotation and facial motion parameters are sent to the receiver side for a better construction of the images.
U.S. Pat. No. 6,526,169 describes a method that uses three-dimensional data for object-based compression. The method uses the depth-from-focus technique as three-dimensional sensor. The method also uses histogram-based segmentation to separate different objects in the scene. This patent does not infer any relation to the subjects, i.e. users of such a system.
In general, the image-based segmentation is inherently problematic since different objects may have same or similar colors in the image, making it impossible to separate objects. The current invention suggests the use of three-dimensional data for this purpose, and provides ways of applying segmentation to three-dimensional data, and identifying the head of a person in an image. The segmented data can be used in many applications, including but not limited to video compression, video segmentation, videophones and multi-media instant messaging applications.
The above-mentioned three-dimensional prior art techniques fail to adequately bridge the gap between current video compression techniques and three-dimensional image retrieval techniques. In the above-mentioned techniques, three-dimensional image capture is either used indirectly, to have a better prediction scheme, or to check the accuracy of the motion vectors created by the two-dimensional capturing mechanism, or the techniques do not relate to the detection of its subjects. Furthermore, such techniques as described do not constitute of time-of-flight techniques that have performance and practical advantages.