Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 frames per second. Each frame can include tens or hundreds of thousands of pixels (also called pels). Each pixel is a tiny element of the picture. In raw form, a computer commonly represents a pixel with 24 bits. Thus, the number of bits per second, or bitrate, of a raw digital video sequence can be 5 million bits/second or more.
Many computers and computer networks lack the resources to process raw digital video. For this reason, engineers often use compression (also called coding or encoding) to reduce the bitrate of digital video. Compression can be lossless, in which quality of the video does not suffer but decreases in bitrate are limited by the complexity of the video. Or, compression can be lossy, in which quality of the video suffers but decreases in bitrate are more dramatic. Decompression reverses compression.
Several international standards relate to video compression and decompression. These standards include the Motion Picture Experts Group [“MPEG”] 1, 2, and 4, and the H.26x series from the ITU. The compression techniques used in such standards include intraframe compression techniques (in which a frame is compressed as a still image) and interframe compression techniques (in which a frame is predicted or estimated from one or more other frames).
In general, video is a series of frames of visual content. The content within a frame may be continuous tone content or palettized content. Continuous tone content includes, for example, photographs or other images with gradually varying colors or tones. Continuous tone content typically uses a range of image tones that appears substantially continuous to the human eye. In many cases, the image tones are represented by 24-bit values (224=16,777,216 different possible colors), but other representations are possible. While it is desirable to encode continuous tone content using only lossless compression if sufficient resources are available, lossy compression is often used to effectively compress continuous tone content at lower bitrate. For example, the MPEG compression standards use lossy compression for encoding camera video consisting of sequences of video image frames. Lossy compression of continuous tone content can introduce errors or other distortions that show up when content is decompressed. Such distortions in continuous tone content are often not detectable or not significant when viewed with human eyes. In other cases, the distortions in continuous tone content are perceptible but acceptable under certain circumstances.
Palettized content appears in a variety of areas including windowed user interfaces or other graphical user interfaces, shared whiteboards or application sharing, or simple animations. Some common examples of palettized content include icons, toolbars, and command or notepad windows consisting of a flat color background and foreground text of a contrasting color. A color palette typically includes a relatively small set of image colors or tones. A simple color palette might include 256 different 24-bit colors, in which case image tones in palettized content could be represented by 8-bit values for the indices of the color palette (28=256). In practice, the number and organization of color palettes, the relationships of color palettes to content, and the representation of image tones in palettized content depend on implementation and can be very complex.
Palettized content often includes areas of perceptually important fine detail—spatially localized, high frequency variations depicting text elements or other image discontinuities. Applying lossy compression to palettized content can result in the loss of perceptually important fine detail. For example, text and sharp edges may be blurred or distorted in the decompressed content. As a result, lossless encoding of palettized content is preferred in many circumstances.
Screen capture is an example of an application in which content can include a mixture of palettized content and continuous tone content. A screen capture tool lets a computer user record an image displayed on a visual display unit such as a computer monitor. The user might use the captured screen area (alternatively called a screen area, screen image, screen shot, screen frame, screen region, capture area, capture image, capture shot, etc.) in a help manual or report to show the results displayed on the display unit at a particular time. For some applications, a user captures a series of screen areas to show how screen content changes. The user might use the series of captured screen areas within an instructional video for job training or remote instruction.
FIG. 1 is a captured screen area (100) of a computer desktop environment according to the prior art. The captured screen area (100) shows the entire desktop, but could instead show only the window (130) or some other portion of the desktop. A cursor graphic (140) overlays the window (130), and several icon graphics (120, 122, 124) overlay the background (110). The window (130), cursor graphic (140) and icon graphics (120, 122, 124) are examples of palettized content. The background (110) is an example of continuous tone content.
Several previous screen capture encoders have used lossless compression to reduce the bitrate of a series of captured screen areas. The lossless compression avoids blurriness and other spatial distortion in the palettized content in the series. However, bitrate is often unacceptably high (usually due to the lossless compression of complex continuous tone content in the series), necessitating reduction of the frame rate for the series. More generally, using the same compression techniques to encode palettized content and continuous tone content in a series of frames fails to take advantage of the differences between palettized content and continuous tone content.