With the increasing popularity of playing streaming audio and video over networks such as the internet, there is a need for optimizing the data transferred from a server to a client such that the client's experience is maximized even if network conditions during playback are inconsistent. Optimizing the client's experience involves choosing a quality level for encoding the audio and video portions of the video playback such that the video can be transferred and reconstructed uninterrupted while preserving the quality of the video content.
The quality level is generally dictated by the bit rate specified for the encoded audio or video portions of the input stream. A higher bit rate generally indicates that a larger amount of information about the original audio or video is encoded and retained, and therefore a more accurate reproduction of the original input audio or video will be presented during video playback. Conversely, a lower bit rate indicates that less information about the original input audio or video is encoded and retained, and thus a less accurate reproduction of the original audio or video will be presented during video playback.
Generally, the bit rate is specified for encoding each of the audio and video based on several factors. The first factor is the network condition between the server and the client. A network connection that can transfer a high amount of data indicates that a higher bit rate can be specified for the input video that is subsequently transferred over the network connection. The second factor is the desired start-up latency. Start-up latency is the delay that a video playback tool experiences when first starting up due to the large amount of data that has to be received, processed, and buffered. The third factor is the tolerance to glitching. Glitching is when video playback has to stop because data is missing. In most cases any amount of start-up latency or glitching is intolerable, and it is therefore desirable to optimize the bit rate specified such that the start-up latency and the glitching are minimized or eliminated.
Currently available commercial streaming media systems rely on multi bit rate (MBR) coding to perform coding rate control. In MBR coding, source video content is encoded into alternative bit streams at different coding rates and typically stored in the same media file at the server. This then allows the content to be streamed in segments or chunks at varying levels of quality corresponding to different coding rates according to the changing network conditions, typically using bit stream switching between segments.
Motion Estimation and Compensation in Video Encoding
Numerous techniques have been developed for encoding video to be compressed to a desired bit rate. Such techniques include intraframe compression techniques (in which a frame is compressed as a still image) and interframe compression techniques (in which a frame is predicted or estimated from one or more other frames). Intraframe compression often involves frequency transformations on data followed by lossy and lossless compression. Interframe compression can include motion estimation.
Motion estimation is a process for estimating motion between frames. Motion estimation for video is said to be an ill-posed problem, because the motion within the viewed scene occurs in 3 dimensions, but the successive frames of video are a projection of the 3D scene onto a 2 dimensional image plane. In one common technique, an encoder using motion estimation attempts to match a block of pixels in a current frame with a similar block of pixels (picture elements or samples) in a search area in another frame (called the reference frame). When the encoder finds an exact or “close enough” match in the search area in the reference frame, the encoder parameterizes the change in position of the blocks as motion data (such as a motion vector).
Conversely, motion compensation is a process of reconstructing frames from reference frames using motion data. In one common technique, an encoder or decoder reconstructs a current frame by applying motion data for the current frame to a reference frame, creating a predicted frame. The encoder can compress the difference (sometimes called the residual) between the predicted frame and the original version of the current frame using the same techniques as used for intraframe compression (e.g., lossy and lossless compression). The overall bit rate of the camera video depends very much on the bit rate of the residuals, which can predominate in the overall bit rate compared to the bit rate for motion data. The bit rate of residuals is low if the residuals are simple (i.e., due to motion estimation that leads to exact or good matches according to some criteria), or if lossy compression drastically reduces the complexity of the residuals. On the other hand, the bit rate of complex residuals (i.e., those for which motion estimation fails to find good matches) can be higher, depending on the degree of lossy compression applied to reduce the complexity of the residuals.