Typically, most current advanced video coding standards specify three common frame types to encode a frame; that is, the I-frame, P-frame, and B-frame. A B-frame is an abbreviation for bi-directional frame, or bi-directional predictive frame, or sometimes bi-predictive frame. B-frames rely on the frames preceding and following them and only include data that has changed from the preceding frame or is different from data in the next frame. P-frame is an abbreviation for predictive frame, or predicted frame. P-frames follow I-frames and include only the data that has changed from the preceding I-frame. P-frames rely on I-frames to fill in most of its data. I-frames, also known as keyframes, is an abbreviation for intra-frame. An I-frame stores all the data necessary to display a frame and involves no inter-frame motion compensated prediction (MCP). In common usage, I-frames are interspersed with P-frames and B-frames in a compressed video. P-frame coding allows only forward inter-frame MCP, while B-frame coding allows not only forward, but also backward, and bi-directional MCP. How to select the right frame-type to code a frame is an important issue that affects not only coding efficiency, but also the perceptual quality of coded video.
I-frame type selection is often straightforward. Besides the 1st video frame, a frame will be coded as an I-frame whenever there is a scene-change, or the maximum Group-of-Pictures (GOP) length has been reached. In practice, the GOP structure with a maximum GOP length is often applied to ensure fast random access of encoded video. However, a predictive/bi-predictive (P/B) frame type selection is a non-trivial and more difficult problem. Compared to P-frame coding, B-frame coding allows more flexible prediction choices, and hence, generally yields better coding efficiency for an individual frame. However, the efficiency of coding the frame that immediately follows the B-frame(s) may be compromised. This is because with that frame's immediate preceding frame(s) being coding as B-frame(s), its prediction now refers to the frame that immediately precedes the B-frame(s), which hence may lead to compromised coding efficiency. P/B frame type should be selected to achieve the best overall coding efficiency. In practice, another disadvantage of B-frame coding is a resultant flickering artifact. Due to backward prediction and bi-directional prediction, the resultant inter-frame difference between a coded P-frame and B-frame, or between two coded B-frames, is usually more significant than that between two coded P-frames. Hence, more flickering artifact may be observed with more B-frame coding, especially at low or medium coding bit rates.
Besides frame type selection/decision, this same problem is addressed in other related and similar contexts. For example, the same problem has been addressed in the dynamic/adaptive Group of Pictures (GOP) structure. Moreover, the same problem has been addressed in the context of reference frame placement/insertion. Basically, the problem is how to properly decide whether a frame should be coded into a P-frame or a B-frame such that the overall coding performance of both the concerned frame and its neighboring frames are optimized.
All the existing schemes primarily target improving the coding efficiency. For that purpose, a widely recognized common heuristic is as follows, as described with respect to a first and a second prior art approach: a P-frame (or a reference frame) should be inserted when inter-frame motion is high, i.e., when the correlation between two neighboring frames is low, while non-reference B-frame coding is more efficient and should be applied to encode low or medium motion frames.
In existing references and literature, the P/B frame type selection problem was usually addressed for GOP based coding scenarios. In a third prior art approach, a scheme was proposed to find the rate-distortion optimal P/B coding pattern/structure of a GOP. For each frame inside a GOP, besides P/B decision, the scheme also searches for the optimal quantization parameter for constant bit rate (CBR) rate control. In spite of its optimality on coding efficiency, this scheme requires multiple actual encoding passes of a frame to see the result of a decision, and thus, incurs impractical computation complexity, not to mention additional latency requirements which may be prohibitive in real-time encoding scenarios.
In fact, most existing schemes are low complexity practical solutions. One type of P/B selection schemes is heuristic based approaches. In the second prior art approach, a P-frame will be inserted when the accumulated motion intensity exceeds a certain threshold, where the motion intensity is measured with the sum of the absolute magnitudes of motion vectors (MV), while the scheme in a fourth prior art approach suggests that a frame will be coded as a B-frame when the motion speed is almost constant, i.e., when its forward and backward motion intensity are similar or balanced. In principle, the heuristics on accumulated motion and balanced motion are complementary, and hence, if applied altogether, better performance will be achieved.
Another type of P/B selection approach is based on mathematical models. However, in practice, B-frame coding may cause annoying flickering artifact due to the involved backward prediction, which is more easily observed at low motion frames.
In a fifth prior art approach, an analytical function is derived that relates the coding gain of a GOP with its P/B pattern and the inter-frame and intra-frame characteristics, and optimal the GOP structure is the one maximizing the coding gain. Other schemes directly model the optimal number of consecutive B-frames as a function of the average motion estimation error and the average spatial activity of a GOP. Instead of an explicit form of mathematical function/model, in a sixth prior art approach, P/B frame type selection is regarded as a classification problem, where the input feature variables are the means and variations of motion estimation error of the current frame and the next frame, and the output is a P/B decision. Given a large amount of training data, the distribution density function for classification is derived with Gaussian Mixture Models (GMMs) and the Expectation Maximization (EM) method. However, for all these model-based schemes, their modeling accuracies are not as well justified as in the heuristic-based approaches, and efficient coding performance may not be always guaranteed.
It is known that in the P/B frame type selection scheme, how to accurately measure the motion intensity of a frame is often an important issue. Frame motion intensity also represents the coding complexity of a frame, as a higher motion frame is also a more complex frame for encoding. Various frame-level histogram based measures were investigated in the prior art. These measures can be easily calculated. However, they are only good at measuring global motion, but not local motion. Motion estimation or compensation helps to derive a more accurate measure of motion intensity. In the second prior art approach, the sum of absolute motion vector (MV) magnitudes of all the macroblocks (MB) of a frame is used to measure motion, while in the sixth prior art approach, only the motion estimation error is used for the measure. However, none of them comprehensively accounts for both the motion vectors and the motion estimation error, which may lead to a more accurate frame complexity measure, and hence, better P/B selection performance.