A screen capture tool lets a computer user record an image displayed on a visual display unit such as a computer monitor. The user might use the captured screen area (alternatively called a screen area, screen image, screen shot, screen frame, screen region, capture area, capture image, capture shot, etc.) in a help manual or report to show the results displayed on the display unit at a particular time. For some applications, a user captures a series of screen areas to show how screen content changes. The user might use the series of captured screen areas within an instructional video for job training or remote instruction.
FIG. 1 is a captured screen area (100) of a computer desktop environment according to the prior art. The captured screen area (100) shows the entire desktop, but could instead show only the window (130) or some other portion of the desktop. A cursor graphic (140) overlays the window (130), and several icon graphics (120, 122, 124) overlay the background (110). The captured screen area (100) could be part of a series. Through the series, much of the screen content in the captured screen area (100) would probably remain the same. Screen content such as the background (110) and icon graphics (120, 122, 124) usually does not change from frame to frame. On the other hand, the cursor graphic (140) often changes position and shape as the user manipulates a mouse or other input device, and the contents of the window (130) often change as a user types, adds graphics, etc.
Like other forms of digital video, screen capture video consumes large amounts of storage and transmission capacity. Many computers and computer networks lack the resources to store and transmit raw screen capture video. For this reason, engineers often use compression (also called coding or encoding) to reduce the bitrate of screen capture video. Decompression reverses compression. To understand how compression and decompression relate to quality and bitrate, it helps to understand how a computer represents screen areas.
I. Computer Representation of Captured Screen Areas
A single rectangular captured screen area includes rows of picture elements [“pixels”] with color values. The resolution of the captured screen area depends on the number of pixels and the color depth. The number of pixels is conventionally expressed in terms of the dimensions of the rectangle, for example, 320×240 or 800×600. The color depth is conventionally expressed as a number of bits per pixel, for example, 1, 8, 16, 24 or 32, which affects the number of possible colors for an individual pixel. If the color depth is 8 bits, for example, there are 28=256 possible colors per pixel, which can be shades of gray or indices to a color palette that stores 256 different 24-bit colors in the captured screen area. A given series of captured screen areas can include pixels with different color depths, as described below.
The frame rate of a series of captured screen areas (i.e., the resolution in time) is conventionally expressed in terms of frames per second [“fps”]. Some conventional frame rates are 1, 2, 10, 15, 25, and 30 fps. A higher frame rate generally results in smoother playback of changing screen content.
Quality affects the number of bits needed to represent a series of captured screen areas, which in turn affects the costs of capturing, processing, storing, and transmitting the series. For example, the bitrate for an uncompressed 2 fps series of 320×240 pixel frames with 8-bit pixels is more than 1 million bits per second. The bitrate for an uncompressed 10 fps series of 800×600 pixel frames with 24-bit pixels is more than 115 million bits per second.
Screen capture images may contain a mixture of continuous tone content and palettized content. Continuous tone content includes, for example, photographs or other images with gradually varying colors or tones, and typically uses a range of image tones that appears substantially continuous to the human eye. Palettized content includes, for example, icons, toolbars, and command or notepad windows consisting of a flat color background and foreground text of a contrasting color. A color palette typically includes a relatively small set of image colors or tones (e.g., 256 different 24-bit colors). Palettized content often includes areas of perceptually important fine detail—spatially localized, high frequency variations depicting text elements or other image discontinuities.
II. Compression and Decompression of Screen Capture Video
Since a series of captured screen areas can have a very high bitrate, there can be performance bottlenecks at the points of storing the series or transmitting the series across a network. Compression of captured screen areas is often used to address these performance bottlenecks by decreasing bitrate. Compression can be lossless, in which quality of the video does not suffer but decreases in bitrate are limited by the complexity of the video. Or, compression can be lossy, in which quality of the video suffers but decreases in bitrate are more dramatic.
Applying lossy compression to palettized content can result in the loss of perceptually important fine detail. For example, text and sharp edges may be blurred or distorted in the decompressed content. As a result, lossless encoding of palettized content is preferred in many circumstances. On the other hand, while it is desirable to encode continuous tone content using only lossless compression if sufficient resources are available, lossy compression can be used in a conventional camera video encoder in some systems to effectively compress continuous tone content at a lower bitrate. The lossy compression, however, can introduce unacceptable distortion in palettized content mixed with the continuous tone content.
Some encoding tools allow compression of screen capture video with any of multiple encoders on a system. The multiple encoders can include, for example, conventional video encoders designed for camera video and screen capture encoders designed for screen capture video.
A. Conventional Video Encoders Designed for Camera Video
Conventional video encoders designed for camera video use a variety of different compression techniques. Commonly, these compression techniques involve frequency transforms, quantization, and entropy encoding for individual frames, and motion estimation for series of frames. The compression techniques can include run length encoding and Huffman encoding.
Run length encoding is simple, well-known compression technique used for camera video, audio, text, and other types of content. In general, run length encoding replaces a sequence (i.e., run) of consecutive symbols having the same value with the value and the length of the sequence. In run length decoding, the sequence of consecutive symbols is reconstructed from the run value and run length. Numerous variations of run length encoding/decoding have been developed. For additional information about run length encoding/decoding and some of its variations, see, e.g., Bell et al., Text Compression, Prentice Hall PTR, pages 105–107, 1990; Gibson et al., Digital Compression for Multimedia, Morgan Kaufmann, pages 17–62, 1998; U.S. Pat. No. 5,467,134 to Laney et al.; U.S. Pat. No. 6,304,928 to Mairs et al.; U.S. Pat. No. 5,883,633 to Gill et al.; and U.S. Pat. No. 6,233,017 to Chen et al.
The results of run length encoding (e.g., the run values and run lengths) can be Huffman encoded to further reduce bitrate. If so, the Huffman encoded data is Huffman decoded before run length decoding.
Huffman encoding is another well-known compression technique used for camera video, audio, text, and other types of content. In general, a Huffman code table associates variable length, Huffman codes with unique symbol values. Shorter codes are assigned to more probable symbol values, and longer codes are assigned to less probable symbol values. The probabilities are computed for typical examples of some kind of content. Or, the probabilities are computed for data just encoded or data to be encoded, in which case the Huffman codes adapt to changing probabilities for the unique symbol values. Compared to static Huffman coding, adaptive Huffman coding usually reduces the bitrate of compressed data by incorporating more accurate probabilities for the data, but extra information specifying the Huffman codes may also need to be transmitted.
To encode symbols, the Huffman encoder replaces symbol values with the variable length, Huffman codes associated with the symbol values in the Huffman code table. To decode, the Huffman decoder replaces the Huffman codes with the symbol values associated with the Huffman codes. Numerous variations of Huffman encoding/decoding have been developed. For additional information about Huffman encoding/decoding and some of its variations, see, e.g., Bell et al., Text Compression, Prentice Hall PTR, pages 105–107, 1990; Gibson et al., Digital Compression for Multimedia, Morgan Kaufmann, pages 17–62, 1998; and Deutsch, “RFC 1951: DEFLATE Compressed Data Format Specification,” Internet Engineering Task Force, May 1996.
B. Screen Capture Encoders Designed for Screen Capture Video
Screen capture encoders use a variety of different techniques to compress screen capture video. Because screen content often includes palettized content, many screen capture encoders use lossless compression.
One prior art screen capture encoder segments pixels of palettized screen capture content into rectangles of more or less internally consistent regions, then losslessly compresses the pixels of the regions with an arithmetic coder. Using a technique called context color coding, the arithmetic coder recognizes patterns in the pixels and uses the patterns to predict more probable/less probable color values during arithmetic coding, which can lower the bitrate for the pixels. For more information about this technique, see U.S. patent application Ser. No. 09/577,544, filed May 24, 2000, entitled “Palettized Image Compression,” the disclosure of which is hereby incorporated by reference.
The main goal of the prior art screen capture encoder is to reduce the bitrate for pixels of palettized content. When applied to other kinds of content, the segmentation can create too many regions and take too long (e.g., if there are not large regions of relatively homogenous content), and the arithmetic coding can be inefficient due to poor prediction and an increased number of unique patterns in the pixels. Even when encoding only palettized content, the encoder generally succeeds at reducing bitrate, but encoding time can vary widely. For a frame of simple content, the encoding time is consistently short (e.g., 0.1 second). For a frame of complex or gradually varying palettized content, however, the encoding time can be unpredictable and/or long (e.g., 1.5 seconds). In some applications, this is unacceptable. For example, for real time streaming of screen capture video, the unpredictable encoding time can disrupt encoding by suddenly overloading the processor or adding overall delay.