In video coding, there are two types of macroblocks that are used: intra macroblocks, which do not need information from previous pictures to be coded, but may use some neighboring macroblocks for coding; and inter macroblocks, which use information from previous or future pictures for coding.
In steroscopic video encoding, there are two views, commonly referred to as the left eye view and the right eye view. With a three dimensional (3D) video, there are two frames encoded—one for each eye. The encoding of these two views assumes that there are two reference views to predict subsequent pictures from, since both views describe the same scene.
For example, the right eye view may be the base view (which is a self-decodable layer) and the left eye view is the dependent view (it is dependent on the right eye view) and needs the base view for optimal coding efficiency. When the dependent view predicts its own pictures, it references the base view since it can remove more redundancies because of the similarities between the views.
Some existing hardware implementations can only support one reference picture for encoding. Coding the base view based on the reference picture does not present any problems. But coding the dependent view presents a choice between using pictures in the dependent layer or coding blindly from pictures in the base layer. For example, when coding the left eye view, there is a choice between coding the left eye dependent view from pictures in the left eye stream or pictures from the right eye stream (because the right eye stream is essentially the same picture as the left eye stream, but shifted or from a different angle, for example).
In a MultiView Coding (MVC) system, the dependent view should be coded using both the base view and the dependent view reference pictures (both paths are allowed and are needed for optimal encoding). In a system where prediction is constrained to use only one reference picture due to hardware throughput or memory bandwidth limitations, there is no mechanism for optimal reference picture selection to be used for encoding the dependent views. This is the case for the left eye view or the right eye view (whichever is the dependent view), but is also generic for multiviews where each layer predicts from the base layer above it.
For the dependent view, there are two existing approaches to address the single reference picture prediction constraint. A first, intuitive solution would be to use two encoding passes: one pass to evaluate the prediction from the base view, and a second pass to evaluate the cost of the prediction from the same view. The two pass approach requires additional time, and is not ideal for real-time encoding. A second solution would be to use blind prediction from only one of the two views (i.e., a one-pass only approach), but this solution is sub-optimal for compression performance.