1. Field of the Invention
The present invention relates to a video communication system based on a background and object separation, which is capable of separating a background from an object and dynamically synthesizing the separated background and object so that they can be used for a video telecommunication in accordance with a request by a user or communication environments.
2. Description of the Prior Art
The development of telecommunication and moving picture compression technologies enables video telecommunications under multimedia environments by which not only voice but also picture of communicating persons can be transmitted and received. The development of video telecommunication enables video telecommunications using PC cameras, video phones/picture telephones, mobile communication terminals, etc.
Under such video telecommunication environments, there may occur a case that a user is reluctant to disclose his present location to the other party, according to circumstances, for example, for the purpose of securing his privacy. In such a case, it should be allowed to perform only voice telecommunication with no video telecommunication or change a background scene into a different scene to be communicated.
However, it is very difficult, in speed and accuracy aspects, to automatically change the background scene under real-time environments during the video telecommunication. Accordingly, conventionally, a technology for sending the other party any still picture as a particular image specified by a user at a communication initial stage and then performing only voice communication has been introduced to some picture telephones.
However, since this technology is to send the other party any still picture replacing a video-telecommunicating person who never appears on a screen, the other party can hear only voice of the person. This means a loss of an essential function of the video telecommunication that a telecommunication is performed in a mutual viewing. In addition, the other party communicating while he sees the still picture without seeing an actual face may feel any displeasure.
Therefore, during the video telecommunication, there is a need of technology, which is capable of separating in real time a communicating person and a background scene and transforming or changing the background scene in a different scene. Like this, when the communication person makes his appearance but the background scene is changed into a different scene during the video telecommunication, the above-mentioned problems occurring when the actual face is conventionally replaced by the still picture cannot be solved.
However, a technology for automatically separating an interest region from a communicating picture in real time is prerequisite to an automatic change of the background scene into a different scene. Hereinafter, a conventional art for separating an object, particularly, the human region and the background scene will be described.
Of moving picture standards as technologies related to the separation of the object and the background scene, a MPEG-4 allows object-based picture compression coding. However, since the MPEG-4 itself is a technology for coding objects separated in advance, an object separation technology is prerequisite to the object-based picture compression coding. However, it is difficult for current technologies to accomplish an object separation which is fast enough to separate a required object and a background except for the object and then code the object and the background in a compression mode under the environments (video telecommunication/video conversation) requiring to compress and transmit video signals in real time.
Therefore, video telecommunication systems adopting the MPEG-4 as a standard cannot accomplish a coding of the unit of object, but compress and transmit the object in the form of general picture compression. This is referred to as MPEG-4 simple level.
On the other hand, technologies for separating the object and the background scene under no real time environment have also been suggested. For example, there is a technique for separating a partial region from an image based on color groups. Although this technique can separate the partial region from the image comparatively well, since the separated region is only a color-based region not a meaningful object region and a separation speed is very slow, it is difficult to apply to technologies requiring the separation of the object and the background scene under the real time environment such as the video telecommunication.
On the other hand, techniques for separating a face from an image have been proposed in consideration that the image in the video telecommunication has a characteristic that an object is a man. For example, there are a technology for extracting a face region by use of a face template formed by a transformation of information into wavelet frequency domain. This technology is a method for scanning and matching all of regions of the image while adjusting their templates from minimal size to maximal size.
However, this method requires very long processing time since the number of template matching is very large.
In addition, as a face region separation technology, there is a technology for extracting the face region by use of a characteristic that the face region has a range of human's skin color. Generally, the human's skin color exists within a specific range in a color space. Therefore, this technology is a method for extracting the face region by use of only pixels satisfying such a human's skin color condition.
However, generally, since the range of human's skin color is too wide, regions other than the face region in a picture have a possibility to be extracted as the face region, which makes a precise separation of the face region difficult.
While most of the above-described technologies for extracting the face region from the still picture do not use characteristics of the moving picture, a technology for extracting the face region by tracking the face region in the moving picture has been proposed.
More particularly, this technology is to track the face region, which has once extracted by means of motion information, through fewer processes.
However, since this technology is to track only a rough position at which the face is placed, it is difficult to precisely separate the face region and the background scene. Namely, although this technology can track an object (human's face) in the moving picture, there is a limit to a precise separation of the object and the background scene.