1. Field of the Invention
Embodiments of the present invention generally relate to a method and apparatus for motion estimation in enhancement layers in video encoding.
2. Description of the Related Art
The demand for digital video products continues to increase. Some examples of applications for digital video include video communication (e.g., video conferencing and multimedia messaging), security and surveillance, industrial automation, and entertainment (e.g., DV, HDTV, satellite TV, set-top boxes, Internet video streaming, video gaming devices, digital cameras, cellular telephones, video jukeboxes, high-end displays and personal video recorders). Further, video applications are becoming increasingly mobile as a result of higher computation power in handsets, advances in battery technology, and high-speed wireless connectivity.
Video transmission systems using the internet and mobile networks have a wide range of receiving devices, i.e., video endpoints, ranging, for example, from cellular telephones with small screens to tablet computers to personal computers with high definition displays to video conferencing systems with large screens. That is, the devices receiving a video transmission may have different resolution, frame rate, and bandwidth capabilities. Scalable video coding (SVC) is one technique that may be used to allow a video to be received by a range of receiving devices according to the capabilities of each device. In general, SVC refers to encoding a video as a single scalable video bit stream with one or more subset bit streams that are adapted to varying video endpoint capabilities, network conditions, and/or user preferences.
A video bit stream is scalable when parts of the stream can be removed such that the resulting subset bit stream is a valid bit stream for some target decoder, and the subset bit stream represents the original video content with a reconstruction quality that is less than that of the complete original bit stream but is high in view of the lower quantity of data in the subset bit stream. Typically, three scalability modes are considered: temporal, spatial, and quality. A spatially scaled subset bit stream represents the original video content at a reduced picture size. A temporally scaled subset bit stream represents the original video content at a reduced frame rate. A quality scaled subset bit stream represents the original video content at the same spatial and temporal resolution as the complete bit stream but at a lower quality, i.e., signal-to-noise ratio (SNR).
H.264/SVC is an example of a video coding standard that provides scalable video coding. More specifically, H.264/SVC is a scalable video coding (SVC) extension of H.264/AVC that supports temporal, spatial and quality scalability functions. A summary of H.264/SVC is presented in H. Schwarz, et al., “Overview of The Scalable Video Coding Extension of the H.264/SVC Standard,” IEEE Trans. Circuits and Systems, vol. 17, No. 9, September 2007, “Schwartz” herein, which is incorporated by reference herein in its entirety. The full description of the SVC extension can be found in Annex G of “Advanced Video Coding for Generic Audio Visual Services,” ITU-T Rec. H.264|ISO/IEC 14496-10, March 2010, “H.264 standard” herein, which is incorporated by reference herein in its entirety. The temporal scalability of H.264/SVC allows decoding of a bit stream at different frame rates by partitioning a set of pictures into a temporal base layer bit stream and temporal enhancement layer bit streams. The spatial scalability and quality scalability of H.264/SVC allow encoding of video at different resolutions and qualities as a base layer bit stream and one or more enhancement layer bit streams.