1. Field of the Invention
The present invention relates to methods and apparatuses for encoding and decoding a multi-view image, and more particularly, to methods and apparatuses for encoding and decoding a multi-view image, which add an image region obtained from a picture captured from one viewpoint to a picture captured from another viewpoint by using a difference in viewpoints between the pictures of the multi-view image, thereby generating a reference picture, and which perform prediction encoding by using the generated reference picture, thereby increasing prediction efficiency.
2. Description of the Related Art
In multi-view image encoding, pictures of a multi-view image input from a plurality of cameras are compressively encoded by using temporal correlation, and spatial correlation between the plurality of cameras.
Temporal prediction using the temporal correlation and an inter-view prediction using the spatial correlation predict and compensate for movement of a current picture in units of blocks by using at least one or more reference pictures, thereby encoding an image. That is, in multi-view image encoding, pictures obtained from cameras at different viewpoints, or pictures input at different times from among pictures captured from the same viewpoint can be used as a reference picture. A block having the highest similarity to a current block is searched for in a predetermined search range of the reference picture, and when a similar block is searched for, only differential data between the current block and similar blocks is transmitted, and thus, a data compression ratio is increased.
FIG. 1 is a reference diagram illustrating prediction encoding of a multi-view image.
In FIG. 1, the x-axis indicates the time axis, and the y-axis indicates the view axis. T0 through T8 of the x-axis respectively indicate a sampling time of an image, and S0 through S7 of the y-axis respectively indicate differential views. In FIG. 1, respective horizontal lines indicate a group of image pictures input from the same viewpoint (hereinafter, referred to as view), and respective vertical lines indicate multi-view pictures captured at the same time.
A method of encoding a multi-view image periodically generates an intra picture related to a picture at a base view. Based on the generated intra pictures, the method performs a temporal prediction or an inter-view prediction, thereby predictably encoding other pictures.
Temporal prediction uses the same view, i.e., temporal correlation existing between pictures in a same horizontal line of FIG. 1. For the temporal prediction, a prediction structure using a hierarchical B-picture may be used. Inter-view prediction uses the multi-view image input at the same time i.e., spatial correlation existing between pictures in a same vertical line of FIG. 1.
When the prediction structure of a multi-view image using the hierarchical B-picture performs a prediction by using the same view, i.e., the temporal correlation existing between the pictures in the same horizontal line, the prediction structure predictably encodes a group of image pictures at the same view as a bi-directional picture (hereinafter referred to as “B-picture”) by using anchor pictures. Here, the anchor pictures indicate pictures included in the vertical line 110 at a first time T0 and the vertical line 120 at a last time T8 from among the vertical lines illustrated in FIG. 1, wherein the vertical lines 110 and 120 include an intra picture. The anchor pictures in the vertical lines 110 and 120, except for the intra picture (hereinafter referred to as “I picture”), are predictably encoded by using only inter-view prediction. Pictures included in the rest of the vertical lines 130, except for the vertical lines 110 and 120 including the I picture, are non-anchor pictures.
An example in which pictures input at a first view S0 during a predetermined time period are encoded by using the hierarchical B-picture will now be described. From among pictures input at the first view S0, a picture 111 input at the first time T0 and a picture 121 input at the last time T8 are encoded as I pictures. Then, a picture 131 input at a time T4 is bidirectionally and predictably encoded as a B-picture by referring to the I pictures 111 and 121 which are anchor pictures. A picture 132 input at a time T2 is bidirectionally and predictably encoded as a B-picture by using the I picture 111 and the B-picture 131. Similarly, a picture 133 input at a time T1 is bidirectionally and predictably encoded by using the I picture 111 and the B-picture 132, and a picture 134 input at a time T3 is bidirectionally and predictably encoded by using the B-pictures 132 and 131. In this manner, since an image sequence at the same view is hierarchically, bidirectionally, and predictably encoded by using the anchor pictures, such a prediction encoding method is defined as hierarchical B-picture structure. Meanwhile, in Bn (n=1,2,3, and 4) illustrated in FIG. 1, n indicates an nth bidirectionally predicted B-picture. For example, B1 indicates a picture first bidirectionally predicted by using anchor pictures which are either an I picture or a P picture, B2 indicates a picture bidirectionally predicted after the B1 picture, B3 indicates a picture bidirectionally predicted after the B2 picture, and B4 indicates a picture bidirectionally predicted after the B3 picture.
When a multi-view image sequence is encoded, a group of image pictures at the first view S0 that is a base view is encoded by using the aforementioned hierarchical B-picture. In order to encode image sequences at the rest of views, pictures at odd views S2, S4, and S6, and at a last view S7 included in the anchor pictures of the vertical lines 110 and 120, are predictably encoded as P pictures by the inter-view prediction using the I pictures 111 and 121 at the first view S0. Pictures at even views S1, S3, and S5 included in the anchor pictures of the vertical lines 110 and 120 are bidirectionally predicted by using a picture at an adjacent view by the inter-view prediction, and encoded as B-pictures. For example, a B-picture 113 input at a second view S1 at the time T0 is bidirectionally predicted by using the I picture 111 and a P picture 112 respectively at adjacent views S0 and S2.
When all pictures at all views included in the anchor pictures of the vertical lines 110 and 120 are encoded as any one of I, B, and P pictures, the non-anchor pictures in the rest of the vertical lines 130 are bidirectionally and predictably encoded by the temporal and inter-view predictions using the aforementioned hierarchical B-picture.
From among the non-anchor pictures in the rest of the vertical lines 130, pictures at the odd views S2, S4, and S6, and at the last view S7 are bidirectionally and predictably encoded by using anchor pictures at the same view and the temporal prediction using the hierarchical B-picture. From among the non-anchor pictures in the rest of the vertical lines 130, pictures at the even views S1, S3, S5, and S7 are bidirectionally and predictably encoded by not only the temporal prediction using the hierarchical B-picture but also the inter-view prediction using pictures at adjacent views. For example, a picture 136 input at the second view S1 at the time T4 is predicted by using anchor pictures 113 and 123, and pictures 131 and 135 at adjacent views. P-pictures included in the anchor pictures of the vertical lines 110 and 120 are predictably encoded by using an I picture at a different view input at the same time, or a previous P picture, as described above. For example, a P picture 122 input at a third view S2 at the time T8 is predictably encoded by using the I picture 121 input at a first view S0 at the same time as a reference picture.
In general, a motion prediction is performed within a predetermined region from a position of a reference picture in a same position as a current block that is to be encoded. At this time, in the case where the current block to be encoded is a block located at an edge of the reference picture, the reference picture has to be extended for the motion prediction. According to the related art, motion prediction is performed by extending a reference picture by a method of outwardly extending pixels located in an edge of the reference picture. This method is known as an extrapolation.
FIG. 2 is a diagram illustrating a method of extending a reference picture according to the related art, and FIG. 3 is a reference diagram illustrating an example of a reference picture extended according to the related art.
Referring to FIG. 2, respective pixels located in an edge of an original reference picture 210 are outwardly extended, and thus, the original reference picture 210 is extended. For example, by making all pixels, which are upwardly located from a pixel 211 located in the upper edge and having a pixel value A, have the pixel value A, the original reference picture 210 is upwardly extended. Similarly, by rightward, downwardly, and leftward extending respective pixels 212, 213, and 214, respectively having pixel values B, C, and D, as far as a predetermined range, the original reference picture 210 can be extended. Referring to FIG. 3, a padded image frame 320 generated by extending pixels in an edge of an original image frame 310 can be checked. In this manner, according to the related art, when a reference picture is generated, pixels at an edge of an original reference picture are outwardly extended so as to generate the reference picture.
However, in order to overcome limitations in restricted bandwidth and increase prediction efficiency, a method capable of more efficiently generating a reference picture is necessary, in consideration of characteristics of a multi-view image.