This invention is in the field of digital video processing, and is more specifically directed to systems for acquiring and digitally encoding video images.
Over recent years, much of the video display, storage, and communication has made a transition from analog to digital formats. For example, digital video disks (DVDs) are now the preferred manner of storing and distributing video content. Many broadcast technologies, such as satellite television broadcasts and cable television, now communicate digital video data, and many local television stations now offer both analog and digital broadcast channels (in anticipation of the conversion to digital-only broadcasting). And, of course, the distribution of streaming and downloadable video content over the Internet is quite popular.
Modern digital video communication and storage relies heavily on compression to keep transmission and storage costs reasonable. Typical compression techniques include standardized approaches such as those promulgated by the Motion Picture Experts Group (“MPEG”), which achieve compression ratios (raw to compressed) of on the order of 10 to 100. For example, the MPEG-2 compression standard is the current compression approach for satellite TV broadcast transmissions, with a typical compression ratio of about 40.
Of course, computational work is required in order to compress a sequence of digital video images. This results in a tradeoff between computational complexity and the memory and bandwidth savings obtained by the compression. At one extreme, if raw digital image data (e.g., twenty-four bits per pixel) were stored and communicated, no compression computations would be required, but the memory and transmission data rate requirements would be massive. At the other extreme, the memory requirements and transmission bandwidth can be greatly minimized by aggressive compression techniques, but at a cost of significant computational complexity and, perhaps, latency resulting from the compression process. Despite the limitations raised by this tradeoff, the demand for high-resolution digital video storage and transmission at minimum cost continues unabated.
By way of further background, modern compression techniques such as MPEG-2 involve significantly more computational complexity for encoding than for decoding. MPEG-2 encoding typically requires three to ten times the computations as decoding. When encoding is to be performed in consumer equipment, such as digital cameras, digital camcorders, and video conferencing equipment, the encoding of video image data is often the primary system constraint.
As known in the art, the MPEG-2 standard contemplates many resolution and frame rate options. However, whether under the MPEG-2 standard or another standard, it is desirable to encode video data at high frame rates, such as up to 30 frames per second (fps). This level of compression is typically accomplished by grouping frames, or images (“pictures”) in the video sequence. The first image in the “group of pictures” (“GOP”) is referred to as an “I” (“intra”) frame, meaning its compression and encoding is performed using only its own information (i.e., independent from other frames). The other images in the GOP following the I frame are referred to as “P” (“predicted”) frames, and are encoded using the context of the “I” frame in the GOP as well as previously encoded “P” frames. In short, the encoding of “P” frames involves the comparison of pixel information in the current image to the previous images in the group, with only the changes from one image to the next being encoded. Encoders (or “video codecs”) under the MPEG-2 and other standards can also include so-called “B” frames, which are bidirectionally predicted (i.e., predicted from both a preceding frame and also a following frame). This compression approach saves significant memory and communication bandwidth, but involves significant computational resources. Depending on how much change is present from frame to frame, a typical GOP includes one “I” frame followed by four or five “P” frames. But regardless of the computational resources, the target frame rate under the MPEG-2 standard is 30 fps at the output of the encoder must be met, although some systems may operate adequately at lower frame rates (e.g., 24 fps).
Referring now to FIG. 1, a conventional video encoding system will now be described, to provide further background for this invention. The system of FIG. 1 corresponds to a digital video camera; it will be understood, by those skilled in the art having reference to this specification, that the system of FIG. 1 can also correspond to a conventional digital video recorder, or other device or system for storing compressed digital video. In the example of FIG. 1, incoming video images are received via a lens system (not shown) and converted from light to electrical signals by CCD imager 2. CCD imager 2 produces digital electrical signals corresponding to the color and intensity of the received light and forwards these signals over bus video_in to image processor 10. It is contemplated that the digital video communicated over bus video_in will correspond to a sequence of video images, or “frames”, and as such constitute full motion video acquisition. Microphone M detects sound simultaneously with the received images; electrical signals corresponding to the sound detected by microphone M are converted into digital form by conventional audio coder/decoder (“codec”) 4, and are forwarded to image processor 10 over line audio_in.
Image processor 10, in this conventional system, is a conventional image processor device such as the DM310 image processor available from Texas Instruments Incorporated. Image processor 10 is a digital processing device, and is responsible for carrying out such functions as encoding of the incoming digital video data into a compressed form, managing memory resources in the system, arranging the images for real-time display on LCD display 15, and handling end-user interactions such as start/stop of record/playback, adjusting resolution or frame rate, and the like. In the conventional system of FIG. 1, the memory resources include flash memory 14, which serves as a boot ROM for image processor 10; external memory interface (EMIF) decoder 12 is provided to interface image processor 10 with flash memory 14 in this conventional system. Synchronous dynamic random access memory (SDRAM) 16 is connected to image processor 10 over a conventional memory bus, and is sized sufficiently to record the incoming video images and encoded audio. Memory card 18 provides removable storage for the recorded video sequences, as another memory resource in the system.
In operation, image processor 10 is reset according to code stored in flash memory 14. Incoming digital video imaged by CCD imager 2 is received by image processor 10. Image processor 10 encodes and compresses these digital video data for storage in SDRAM 16 (or in memory card 18, if selected by the user). If real-time previewing during video acquisition is enabled, image processor 10 also forwards displayable image data to LCD display 15. After video acquisition, image processor 10 can also control the communication of stored video from SDRAM 16 or memory card 18 to LCD display 15 for local playback. In addition, image processor 10 can also download the stored video data to a computer or other device by way of a conventional USB or IEEE 1394 interface (not shown).
As mentioned above, the encoding and compression of the incoming video image data by image processor 10 must meet certain constraints. Typically, image processor 10 must have sufficient computational capacity to encode and compress image data corresponding to a specified frame rate. As mentioned above, the MPEG-2 standard requires a frame rate of 30 frames per second, for example.
But the computational resources of image processor 10 are of course finite. In order to meet the specified frame rate with these finite resources, other parameters in the video encoding are necessarily limited. These limitations are most often evident in the resolution of the encoded image sequence, resulting in a limited image size in pixels or a grainy image appearance. The encoding may also skip frames of the incoming video sequence in order to reach the specified frame rate, causing rapidly moving features in the video sequence to appear “choppy” when displayed. Therefore, it continues to be desirable to increase the computational resources of image processors, such as image processor 10 in the system of FIG. 1, in order to achieve a desired frame rate with the highest resolution image as possible with realistic frame-to-frame transitions. But the cost of providing these desired capabilities can surpass the design cost constraints.
As mentioned above, for a given frame rate, additional demands for high resolution video transmission and storage continue to accrue. In particular, high-definition television (“HDTV”) is becoming increasingly popular, especially as additional HD programs and content sources appear in the marketplace. Video encoding and compression of HD video images, particularly at the camera and transmission functions, greatly increases the computational requirements for compression beyond that of conventional video acquisition and recording. The processing requirements for HD video acquisition are now sufficiently high that HD systems remain somewhat cost prohibitive.