This invention relates generally to communications within a computer network and more particularly to video image communications and display.
Video imaging refers to the rendering of text and graphics images on a display. Each video image is a sequence of frames, typically thirty frames are displayed on a screen every second. Images are transmitted over various high bit rate communications media, such as coaxial cable and Asymmetric Digital Subscriber Line (xe2x80x9cADSLxe2x80x9d), as well as over lower bit rate communications media, such as Plain Old Telephone Service (xe2x80x9cPOTSxe2x80x9d), wireless phone service and power line communication networks. Video images may be displayed in black and white, gray scale or color. A 24-bit color video image at 640xc3x97480 pixel resolution would occupy almost one megabyte per frame, or over a gigabyte per minute to display, therefore lower bit rate communication media is unable to provide real time display of video images without some improvement.
One improvement in the throughput of video communications has been the use of video compression to reduce the size of files and packets containing video images represented in digital form, thereby increasing the resolution of displayed video images. Video compression can be applied both intraframe (using only information contained in a single frame) or interframe (using information in other frames of the video image). Because humans cannot perceive very small changes in color or movement, compression techniques need not preserve every bit of information. These lossy compression techniques can be used to achieve large reductions in video image size without affecting the perceived quality of the image. Compression techniques alone have not produced the transmission quality required for video applications (e.g., video telephony) on lower bit rate networks.
MPEG (Moving Picture Experts Group) is an ISO/IEC working group developing international standards for compression, decompression, and representation of moving pictures and audio. MPEG-4 is a part of the standard currently under development designed for videophones and multimedia applications. MPEG-4 provides for video services on a lower bandwidth of up to 64 kilobits per second. MPEG-4 uses media objects to represent audiovisual content. Media objects can be combined to form compound media objects. MPEG-4 multiplexes and synchronizes the media objects before transmission to provide higher quality of service. MPEG-4 organizes the media objects in a hierarchical fashion where the lowest level has primitive media objects like still images, video objects, audio objects. MPEG-4 has a number of primitive media objects which can be used to represent two or three-dimensional media objects. MPEG-4 also defines a coded representation of objects for text, graphics, synthetic sound, and talking synthetic heads. The visual part of the MPEG-4 standard describes methods for compression of images and video, it also provides algorithms for random access to all types of visual objects as well as algorithms for spatial, temporal and quality scalability, content-based scalability of textures, images and video. Additionally, algorithms for error robustness and resilience in error prone environments are also part of the standard. For synthetic objects MPEG-4 has parametric descriptions of human face and body, parametric descriptions for animation streams of the face and body. MPEG-4 also describes static and dynamic mesh coding with texture mapping, texture coding with view dependent applications.
MPEG-4 supports coding of video objects with spatial and temporal scalability. Scalability allows decoding a part of a stream and constructing images with reduced decoder complexity (reduced quality), reduced spatial resolution, reduced temporal resolution., or with equal temporal and spatial resolution but reduced quality. Scalability is desired when video is sent over heterogeneous networks, or receiver can not display at full resolution (limited power). Robustness in error prone environments is an important issue for mobile communications. MPEG-4 has tools to address robustness, including resynchronization of the bit stream and the decoder when an error has been detected. Data recovery tools can also be used to recover lost data. Error concealment tools are used to conceal the lost data. MPEG-4 is a general purpose scheme designed to maximize video content over communication lines.
Streaming is a technique used for sending audiovisual content in a continuous stream and having it displayed as it arrives. The content is compressed and segmented into a sequence of packets. A user does not have to wait to download a large file before seeing the video or hearing the sound because content is displayed as it arrives, and additional content is downloaded as already downloaded content is displayed. Streaming can be applied to MPEG-4 media objects to enhance a user""s audiovisual experience.
H.261 is a standard that was developed for transmission of video at a rate of multiples of 64 Kbps. Videophone and videoconferencing are some applications. H.261 standard is similar to JPEG still image compression standard and uses motion-compensated temporal prediction.
H.263 is a standard that was designed for very low bit rate coding applications. H.263 uses block motion-compensated Discrete Cosine Transform (xe2x80x9cDCTxe2x80x9d) structures for encoding. H.263 encoding has higher efficiency than H.261 encoding. H.263 is based on H.261 but it is significantly optimized for coding at low bit rates. Video coding is performed by partitioning each picture into macroblocks. Each macroblock consists of 16xc3x9716 luminance block and 8xc3x978 chrominance blocks of Cb and Cr. Cb and Cr are the color difference signals in ITU-R 601 coding. The two color difference signals are sampled at 6.75 MHZ co-sited with a luminance sample. Cr is the digitized version of the analogue component (R-Y), likewise Cb is the digitized version of (B-Y). Each macroblock can be coded as intra or as inter. Spatial redundancy is exploited by DCT coding, temporal redundancy is exploited by motion compensation. H.263 includes motion compensation with half-pixel accuracy and bidirectionally coded macroblocks. 8xc3x978 overlapped block motion compensation, unrestricted motion vector range at picture boundary, and arithmetic coding are also used in H.263. These features are not included in MPEG-1 and MPEG-2 since they are useful for low bit rate applications. H.263 decoding is based on H.261 with enhancements to support coding efficiency. Four negotiable options are supported to improve performance. These are unrestricted motion vector mode, syntax-based arithmetic coding mode, advanced prediction mode and PB-frames mode. Unrestricted motion vector mode allows motion vectors to point outside a picture. Syntax-based arithmetic coding mode allows using arithmetic coding instead of Huffman coding. Advanced prediction mode uses overlapped block motion compensation with four 8xc3x978 block vectors instead of a single 16xc3x9716 macroblock motion vector. PH-frames mode allows a P-frame and a B-frame to be coded together as a single PB-frame.
Model based video-coding schemes define three-dimensional structural models of a scene, the same model is used by a coder to analyze an image, and by a decoder to generate the image. Traditionally research in model-based video coding (xe2x80x9cMBVCxe2x80x9d) has focused on head modeling, head tracking, local motion tracking, and expression analysis, synthesis. MBVC has been mainly used for videoconferencing and videotelephony, since in those applications the focus is on the modeling of the human head. MBVC has concentrated its modeling on images of heads and shoulders, because they are commonly occurring shapes certain video applications (e.g., videotelephony). In model-based approaches a parameterized model is used for each object (e.g., a head) in the scene. Coding and transmission is done using the parameters associated with the objects. Tools from image analysis and computer vision are used to analyze the images and find specific parameters (e.g., size, location, and motion of the objects in the scene).
Motion vectors are used in a technique used to segment video images, based upon an analysis of the global motion of a video sequence. With this technique, only pixels that represent a portion of the image that changed since the last refresh need to be transmitted along a communications link. Motion vectors can reduce the amount of data needed to transmit an image, thus increasing the effectiveness of low bit rate communication links.
Video conferencing is teleconferencing in which video images are transmitted among the various geographically separated participants at a meeting. Originally done using analog video and satellite links, today video conferencing uses compressed video images transmitted over wide area networks or the Internet. Typically, a 56 Kbps communications channel can support freeze-frame video, whereas a 1.544-Mbps (Ti) channel supports full-motion video.
Providing full-motion video experience, with television-like definition, for at least the most significant regions of a displayed image, at low bit rate communication speeds, has proven to be a very difficult problem.
Compression is one approach to solving the bandwidth problem of displaying video images on low bit rate communication channels. Compression is typically implemented at the physical/link layer in the network protocols model. A problem with using compression at the physical/link level is that it cannot be sensitive to which regions of the displayed image are deemed important to the viewer.
Using asymmetrical communications technologies, such as ADSL or cable, doesn""T address the problem of video image communications when used for conferencing (e.g., video teleconferencing), because conferencing is a symmetrical application. Asymmetrical technologies optimize only one direction of the communications channel. They are very useful in specific types of communications applications (e.g., Web browsing, where a vast majority of the communications activity involves downloading data from the Internet, but very little uploading to the Internet). Video teleconferencing and videotelephony are symmetrical applications, requiring approximately equal bandwidth in both directions, while ADSL or cable are asymmetrical technologies that typically provide 10:1 download to upload ratios.
Motion vectors can improve effective bandwidth in a video communications system, but can not differentiate significant image changes (e.g., a facial expression) changing) from insignificant ones (e.g., a cloud passing in the background).
The present invention address the content issues at the application layer by identifying which regions of the video image are most significant to the viewer, while supporting the symmetric nature of video conferencing and videotelephony.
Thus, there is needed an effective technique for enhancing the throughput of low bit rate video image communications. Accordingly, the present invention provides a method, apparatus and article of manufacture for providing throughput enhanced video communication when receiving a video image over a communications link. The video image is mapped based on a predetermined color range, resulting in a color mapped image. The color range may consist of a black and white, gray-scale or color spectrum. A first template is created around certain regions of the color mapped image. A second template is created around certain shapes within the first template. Bandwidth on the communications link is adjusted based on the regions of the video image that are: 1) outside the boundaries of the first template, 2) inside the boundaries of the first template, and 3) inside the boundaries of the second template. Finally, the throughput enhanced video image is displayed on a display screen.
One common image in video communication is that of the human face. The present invention uses various techniques to optimize video communications containing the human face. Accordingly, the color range used for mapping the video image is settable, and may include a range of flesh-tone colors. The first template approximates the shape of a human face. The second template approximates a rectangular shape of the eyes region of a human face and a triangular shape of the nose-mouth region of a human face. Based upon the use of mapping and templates; the highest amount of bandwidth is allocated to the image within the boundaries of the second template, a next highest amount of bandwidth is allocated to the image within the boundaries of the first template, and remaining bandwidth is allocated to the image outside the boundaries of the first template. The templates can be used in conjunction with a tracking system to compensate for movement of the face within a video image. Additionally, a user can manually adjust the boundaries of the templates to override their initial parameters.
The progression of mapping and template creation, when used in conjunction with the setting of communications bandwidth parameters, provides a novel solution to the problem of enhancing the throughput of low bit rate video image communications. These improvements increase a user""s ability to read visual messaging and facial queues from embedded facial expressions because of the higher resolution allocated to specific areas of the video image. Specifically, the present invention addresses the problems caused by poor definition of facial expressions in video image communications. As social animals, human beings rely inordinately on visual messaging embedded in facial expressions, as those messages portray emotions and are powerful cues to the nonverbal aspects of face-to-face communications. Providing throughput enhanced video communications improves the definition of facial expression and provides for better human understanding in video image communications. This helps avoid problems that can arise when a user exerts so much effort trying to perceive poor quality facial expressions that they can actually miss some of the audio message as well. Efficient allocation of bandwidth in bandwidth-constrained environments (e.g., analog and wireless) is very important to providing high quality video images in videoconferencing and videotelephony systems.