The present invention relates to a method and apparatus for digitally encoding video image data, and is particularly suited for encoding Internet Web pages for transmission and display.
With the ever-increasing popularity of the Internet, a number of systems and devices have appeared in the marketplace that substantially reduce the initial equipment expense required for accessing the Internet. For example, inexpensive dedicated processors are available which enable a user to access the Internet using a telephone line, and download Internet Web pages for display on the user's television set.
Recently, an even more attractive Internet access system has been proposed which completely eliminates the need for a user to have a telephone line and a dedicated processor running a browser application locally at their premises. This system employs a modified cable television (CATV) system that uses the downstream cable channels to transmit Internet-based information to the system users via for display on their television sets. Each user is provided with a set top converter box that has been modified to enable entry of data or commands via a keyboard, remote controller or other input device. One or more upstream channels are provided which transmit the entered data or commands to a headend server in the CATV system. The headend server is interfaced to the Internet via an Internet Service Provider (ISP), for example, and includes processing equipment which can simultaneously operate a plurality of resident Internet browser applications, one for each system user requesting Internet access. The headend server therefore contains all of the processing equipment necessary to access the Internet through the ISP, while each user's set top box acts as an input/output device for interfacing the user to the Internet.
In the operation of the system, a user requests Internet access by entering an appropriate command into the set top box that transmits the command through an upstream channel to the headend server. In response, the headend server connects the user to one of the resident browser applications via one of the system's downstream channels.
The Internet-based information, e.g., Web pages, can be transmitted through the downstream channel in a number of ways. In an analog implementation, for example, the Internet data can be inserted into the vertical or horizontal blanking intervals of the conventional analog television signals which are simultaneously transmitted on the selected downstream channel. In an all-digital embodiment, however, the Internet data must be encoded in the same format that is employed for digitally encoding video signals. More particularly, the data must be encoded using standardized procedures for encoding, storing, transporting and displaying continuous video frames that have been specified by The Motion Picture Experts Group (MPEG). Thus, the image bit map generated by the browser application is not rendered at the headend, but instead is further compressed by an MPEG image encoder. It is the compressed image data that is transmitted to a user.
MPEG encoding is a video image compression technique that substantially reduces the amount of motion picture image data that must be transmitted. This data reduction is made possible because spatial redundancy exists within an image frame (intra frame compression). In addition, each succeeding frame in a motion picture video usually contains substantial temporal redundancy, i.e., portions which have either not changed from the previous frame, or have only been moved relative to the previous frame (inter frame compression). When spatial redundancy is removed from a frame, the frame is said to be encoded as an intra-coded frame (I-frame). In an inter frame compression scheme, two different compression algorithms may be employed to generate two kinds of encoded frames. A compressed image frame is called a Predictive-coded frame (P-frame) if only a prior frame is compared and the difference is coded. Another inter frame compression results in a Bidirectionally predictive-coded frame (B-frame) if both a prior frame and a post frame are used for encoding. In these cases, it is not necessary to transmit all of the image data for each frame. Instead, only the difference data representing the portions in the current frame that have changed from the neighboring (previous or later) frame(s) is transmitted. For areas in an image which have been moved relative to the previous frame, it is possible to search for these areas, and then generate a motion vector which instructs a receiving decoder to construct a portion of the next image frame by moving a corresponding portion in the previous image frame a specified displacement and direction. To encode a sequence of video frames, the first frame is encoded as an intra or I frame where information for all of the pixels in the frame needs to be transmitted since no previous frame information is available. The next frame in the sequence can then be encoded either as an P (predictive) frame or a B (bi-directional predictive-coded) frame which includes only the difference or motion vector data resulting from the frame comparisons. P or B frames can continue to be used for encoding the succeeding frames in the sequence until a substantial change, such as a scene change, occurs, thus necessitating formation of another I frame. In practice, however, the encoder is programmed to encode I frames at a constant rate, such as for every other N frames. The MPEG encoding procedure thus compresses images by suppressing statistical and subjective redundancy inter and intra frames. An MPEG decoder is capable of decompressing the coded image close to its original format so that the decompressed image may be displayed on a display device, such as a television or computer monitor.
In the Internet Web page display application, only P frames are usually employed for inter frame compression because B frame coding requires comparison with post (later in time) frames which are not available immediately. However, a B frame can be encoded by forward comparison only between the current frame and the prior frame as a special case, and in this instance, can also be employed for Web page inter frame compression.
In the application of MPEG encoding to the previously described CATV system, each user's set top box includes an MPEG decoder for decoding the digital video bit stream received on the downstream channels. This requires that any Internet Web page image data to be transmitted to the set top boxes also be MPEG encoded. An MPEG encoder is thus incorporated in the cable headend to encode the browser generated Web page image data, which usually is a bit map, before it is transmitted on one of the downstream channels to a user's set top box.
In general, however, MPEG encoding of Web page image data is needlessly intensive from a computation standpoint since Web pages do not usually incorporate full motion video, and often appear to be nothing more than a still image. Strictly speaking, though, the Web page is not a still image. Due to the limited viewing size of a display device, the Web page is usually larger than the display device's viewing area. A user may therefore scroll a Web page to move the page horizontally or vertically to view the whole page. Depending on the speed at which the page is scrolling, the images on the display device may thus be considered to be a series of video frames displayed at a variable frame rate. Other Web pages may contain a small animation window in which several localized pictures are alternatively displayed at a certain rate. JAVA applets animation and regional character updates which occur as a user types an e-mail message are other examples of this local animation scenario. In both of these cases, MPEG inter frames may be constructed after the generation of a first, intra frame, to reduce the number of bits needed to represent each frame, thus substantially reducing the required bandwidth in the communication link.
As discussed previously, when an inter frame is generated, motion vectors must be found, coded and transmitted so that the MPEG decoder can reform the frame. A motion vector search is one of the most difficult tasks in designing an MPEG encoder. Since the MPEG committee defined only the syntax and semantics of a compressed frame, but did not define how motion vectors searching should be implemented, numerous proprietary motion vector search algorithms were developed by various encoder vendors. For continuous video compression, however, a motion vector search is very complicated and requires a large percentage of the entire encoding computational effort. More particularly, in MPEG encoding, each video frame to be encoded is subdivided into a plurality of multiple 64 (8.times.8) pixel blocks, and four such blocks covering a 16.times.16 pixel area are known as a macroblock. During encoding, the MPEG encoder searches for the best match between each macroblock of a present frame to be encoded with the corresponding macroblock in the previous frame. This search for the best match is known as motion estimation.
The existing algorithms for motion estimation fall into two categories: feature/region matching and gradient-based. In the first category, both block matching and hierarchical block catching can be employed for motion estimation. For encoding a continuous video, the encoder has to search the entire screen (exhaustive search) to find the best match because the encoder knows nothing about the motion from frame to frame. In gradient-based motion estimation, the exhaustive search may be avoided at the price of solving linear equations during search.
All of the algorithms require many iterations to complete the motion estimation. After the best match is found, the difference between the matched macroblocks is calculated by comparing the macroblocks. If the difference is small enough, a motion vector is generated which determines the direction and offset of the motion. Both the difference and the motion vector are encoded and transmitted. If the difference is larger than a threshold, the macroblock of the present frame is allowed to be intra compressed as one encoded in an I frame.
In view of the foregoing, any video image encoding technique that eliminates the need for motion vector search algorithms would be desirable in view of the resulting substantial savings in computation time and intensity.