This application claims the benefit under 35 U.S.C. §119(a) of Korean Patent Applications No. 10-2005-0027729, filed on Apr. 1, 2005, and No. 10-2006-0025680, filed on Mar. 21, 2006 in the Korean Intellectual Property Office, the disclosures of which are hereby incorporated by reference.
1. Field of the Invention
The present invention relates to image encoding and decoding methods and apparatuses. More particularly, the present invention relates to scalable multi-view image encoding and decoding methods and apparatuses which filter multi-view images input from a plurality of cameras in spatial-axis and temporal-axis directions using motion compensated temporal filtering (MCTF) or hierarchical B-pictures and scalably code the filtered multi-view images using a scalable video coding (SVC) technique.
2. Description of the Related Art
Digital broadcasting services are expected to evolve from high-definition television (HDTV) and satellite/ground-wave digital multimedia broadcasting (DMB) services to interactive TV and broadcasting services, to three-dimensional (3D) TV and broadcasting services, and then to reality broadcasting services. Reality broadcasting services provide viewers with information regarding images of scenes at various viewpoints. Reality broadcasting services allow a viewer to select a preferred scene by creatively editing an image of the scene provided by a broadcasting station. To implement such reality broadcasting services, panorama images must be generated. To generate a panorama image, images are acquired using a plurality of cameras placed at various viewpoints. Then, the acquired images are connected. Alternatively, a panorama image may be obtained using an omni-directional camera system. A large amount of data must be collected and transmitted to deliver image information obtained using a plurality of cameras to users. Accordingly, various methods of collecting information regarding multi-view images have been studied. For example, a multi-view camera system, a stereoscopic camera system and an omni-directional camera system, have been studied. A multi-view camera system simultaneously films or transmits a subject or a scene using a plurality (M) of cameras and provides users with various scenes or a three-dimensional (3D) scene provided by the M cameras at different locations.
Multi-view image coding relates to simultaneously coding images input from M cameras that provide multi-view images. Multi-view image coding also relates to compressing, storing, and transmitting the coded images. When a multi-view image is stored and transmitted without being compressed, a large transmission bandwidth is required to transmit the data to users in real time through a broadcasting network or wired/wireless Internet due to the large volume of data of the multi-view image. For example, when 24-bit color images, each with a resolution of 1310×1030 pixels, are input from 16 cameras at a rate of 30 frames/sec, 14.4 Gb/sec data must be processed. Therefore, a 3D audio and video subgroup in the Motion Picture Experts Group (MPEG) has organized a group dedicated to devising a multi-view coding method. The group attempts to make a method of coding a huge amount of image data input from a multi-view video using an international standard for video compression.
FIGS. 1A through 1C illustrate arrangements of conventional multi-view cameras. FIG. 2 illustrates images respectively and simultaneously input to 16 multi-view cameras arranged in a 4×4 parallel structure in a free-viewpoint TV (FTV) system. FIGS. 1A through 1C illustrate a plurality of cameras 10 arranged in a parallel structure, a convergent structure, and a divergent structure, respectively.
Referring to FIG. 2, the images respectively input to the 16 cameras are very similar. In other words, a high correlation exists between the images input to the cameras that provide a multi-view image. Therefore, information regarding the high spatial correlation between the images input to the cameras can be utilized to achieve high compression efficiency in multi-view video coding. Also, spatio-temporal scalable coding is required to present 3D or 2D images in various environments and using terminals with diverse computational capabilities.
Accordingly, there is a need for improved apparatuses and methods to filter multi-view images input from multiple cameras in the spatial-axis and temporal-axis directions to support a variety of spatio-temporal scalabilities.