Several techniques exist for compressing digital image files. Compression is done in order to reduce the resource requirements for storing and transmitting image files. Lossless compression techniques exploit statistical characteristics of the image data to code the files more efficiently, and allow exact reconstruction of the original data. “Lossy” compression techniques use similar statistical methods, and also tolerate small changes in the content of the files after compression and reconstruction. Lossy techniques typically produce compressed files considerably smaller than files produced by lossless techniques, and in some applications, the changes in content are negligible.
One commonly used lossy compression technique is the JPEG technique, named for the Joint Photographic Experts Group, the committee that developed the specifications for standard use of the technique and for the standard file format of JPEG image files. The JPEG technique is especially useful for images of natural scenes, and is widely used for compressing digital photographs. Many digital still cameras include circuitry that implements the JPEG standard to create compressed files.
Some digital cameras provide the ability to capture moving pictures as well as still images. Moving pictures may be thought of as sequences of still images. To facilitate the compression of moving pictures, another standard, called MPEG, has been developed by the Moving Picture Experts Group. There are several variants of MPEG compression, but the features described in this specification are common to all, so all the variants will be referred to here generically as MPEG.
In a simple implementation of MPEG, a moving picture sequence comprises a series of individually compressed still images called “I-frames”. An MPEG I-frame is intra-coded, that is, compressed without regard to the content of frames occurring before or after it in the sequence. The MPEG technique allows, but does not require, other kinds of frames, for example “P-frames” and “B-frames”, that do take into account the content of adjoining frames. The present invention addresses the generation of I-frames.
Some of the processing necessary to construct an MPEG I-frame is identical to some of the processing used to construct a JPEG compressed image. However, a finishing step is significantly different between the two techniques.
The circuitry or other engine used in cameras to construct JPEG images is often configurable in order to allow the compression to be optimized for particular data, and some flexibility is allowed within the JPEG specification. However, it is not possible to construct a completed MPEG I-frame using a standard JPEG engine or circuitry.
A brief and simplified example will aid in providing an overview of the steps involved in JPEG and MPEG compression.
An MPEG I-frame has many similarities to a still image compressed using the JPEG technique. The sequence of steps required for generating either a JPEG image or an MPEG I-frame includes:                0. Color space conversion        1. Downsampling, also called subsampling or decimation        2. Constructing macroblocks        3. Performing a Discrete Cosine Transform (DCT)        4. Quantization        5. “Zig zag” ordering of the quantized coefficients        6. Differential coding of the DC coefficient from the DCT        7. Run-length coding of the AC coefficients from the DCT        8. Variable-length coding of the coefficients from the DCT        
All of these steps except the last may be performed identically whether the desired result is a JPEG image or an MPEG I-frame. However, the final step of variable-length coding the coefficients is significantly different for constructing an MPEG I-frame than for constructing a JPEG image.
A digital camera produces an ordered array of data representing an original scene. Each location in the scene is represented by a corresponding picture element, or “pixel”. The data describing each pixel indicate the brightness and color of the original scene at the location corresponding to the pixel. The brightness and color are often represented by numerical values indicating the strengths of red, green, and blue light sensed from the scene location. An image of this type is often said to be in “RGB” format. Other representations of brightness and color may be used, and conversions from one system of representation, or “color space”, to another are readily accomplished.
Both JPEG and MPEG require the image to be represented in the color space known as YCrCb. In the YCrCb color space, a pixel is described by its overall brightness or luminance, (Y) and two chrominance values (Cr and Cb) that describe the color of the pixel. The color space conversion step of JPEG or MPEG compression involves converting from another color space such as RGB to YCrCb.
Many cameras use electronic array sensors that have many more pixels than are typically used in moving picture frames. Often, cameras provide the ability to save images at various resolutions. The lower the resolution, the fewer pixels used to represent the image and the less detail will be visible in the image file. The conversion from a higher resolution image to a lower resolution image is often called downsampling, subsampling, or decimation.
Additionally, the MPEG specification requires and the JPEG specification allows the chrominance channels of the image to be further downsampled in the 4:2:0 video format. In this format, the chrominance channels are downsampled to half the linear resolution of the luminance channel in each of the two orthogonal coordinate directions of the image. Thus each chrominance channel represents the image with one fourth as many pixels as does the luminance channel, and at a correspondingly lower resolution. Chrominance downsampling takes advantage of the human visual system's decreased sensitivity to resolution in the chrominance channels in comparison with the luminance channel to reduce the data required to represent a pleasing image.
Once the image is downsampled, it is divided into “macroblocks”. A macroblock consists of a 16-pixel by 16-pixel sample array of luminance samples together with one 8-sample by 8-sample block from each of the chrominance channels. The sample array of luminance samples may be thought of as four subarrays that are each eight pixels square. Images that are not a multiple of 16 pixels wide or tall are padded with blank pixels so that complete macroblocks may be constructed. The next step in the process uses the data in arrays of numbers eight elements square. The division of the image into macroblocks may be entirely conceptual, as the data in the memory of the camera, imaging device, or system need not be rearranged to accomplish the division.
Identifying the macroblocks partitions all of the image data, both luminance and chrominance, into arrays that are eight elements square. For example, an array of luminance samples may be as follows:
102100101101104104122137(1)1021001001011041081211321041021011011051061231351071051039910710912313411010510410410911012613811210910797111113129139114102113112122121136153124118124124140151164181(1)
This example array of luminance data will be used below to illustrate the following steps, and to describe an embodiment of the invention. One of ordinary skill in the art will recognize that the steps and the embodiment of the invention apply to both luminance and chrominance data, and that no loss of generality is intended or created by using a single example array.
For each 8×8 array in the image, a two-dimensional discrete cosine transform (DCT) is performed. The DCT is described in MPEG Video Compression Standard, edited by Joan L. Mitchell, William B. Pennebaker, Chad E. Fogg, and Didier J. LeGall, and published by Chapman & Hall, ISBN 0-412-08771-5. The DCT of the example array above is:
928.12−86.2953.66−15.1213.12−3.351.1811.27(2)−64.2318.27−2.00−5.23−1.061.39−5.46−4.2236.50−18.85−1.66−1.362.67.893.53−.37−25.0611.061.78−1.51.19−.14−1.192.2719.38−6.591.41.14−.13−.72−.18−1.64−11.013.31−.84−2.722.88.39.762.636.12−1.254.78.60−3.68−2.55−1.84.77−1.07−1.29−1.92−3.465.363.18−.24−.65
The upper left DCT coefficient indicates a scaled average value of the input data array. In general, the other coefficients represent the spatial frequency content of the image, with higher frequency components at the lower right of the array.
The next step in both JPEG and MPEG compression is to “quantize” the array. Quantization is performed by an element-by-element division by another array of quantizing values, and rounding the results. An example array of quantizing values is:
816192226272934(3)1616222427293437192226272934343822222627293437402226272932354048262729323540485826272934384656692729353846566983
Using array (3) to quantize the array (2) of DCT coefficients above gives these quantized coefficients:
116−5200000(4)−4100000010000000−1000000000000000000000000000000000000000
After the quantization, the coefficients are placed in a “zig zag” order. The order of reading out the coefficients is illustrated below:
126715162829(5)35814172730434913182631424410121925324145541120243340465355212334394752566122353848515760623637495058596364
Because coefficients in the lower right part of the array are likely to be zero after quantization, the zig zag ordering tends to maximize runs of zeros in the ordered list. The coefficients of the example array in zig zag order are:                116 −5 −4 1 1 2 0 0 0 −1 0 0 0 0 . . . (50 more zeros)        
The first coefficient in this list represents a scaled average value for the pixels in the 8×8 block. This is often called the “DC” coefficient. The other coefficients are called “AC” coefficients. In both JPEG and MPEG, the DC coefficient for each block is differentially coded. That is, rather than store the coefficient itself, the difference between the coefficient from the previous block and the present coefficient is stored. Because the DC coefficients tend to change slowly, this differential coding tends to allow the storage of smaller numbers, thereby conserving storage space. In this example, it is assumed that the previous pixel block had a DC coefficient after quantization of 120, resulting in a difference of −4. The coefficients can be further arranged as follows:
TABLE 1CoefficientPrecedingNumberrun of zerosValue0 (DC)N/A−410−520−430140150293−1End of Block
The final step in compressing a block of pixels is to encode this information using variable length coding, which is often called Huffman coding. In Huffman coding, common patterns in data are assigned short sequences of bits, while less common patterns are assigned longer sequences. The sequences are chosen so that they cannot be confused with each other. In this way, data that have a nonuniform distribution of pattern frequencies can be stored losslessly in a smaller form.
In both JPEG and MPEG, different codes are used for the DC and AC coefficients. In JPEG, different codes may be used for the luminance channel and the chrominance channels.
MPEG specifies the table of Huffman codes for the quantized DCT coefficients. JPEG allows the user to select a set of Huffman codes. It is possible to select the JPEG codes for the DC coefficient to match the MPEG specification. However, the coding schemes are significantly different between JPEG and MPEG for the AC coefficients, and it is not possible to configure a JPEG engine to generate the Huffman code stream of an MPEG file.
The Huffman codes for the DC coefficient of the luminance channel for an MPEG file and a typical JPEG file are selected according to the following table:
TABLE 2Y codesizemagnitude range10000001−1, 1012−3 . . . −2, 2 . . . 31013−7 . . . −4, 4 . . . 71104−15 . . . −8, 8 . . . 1511105−31 . . . −16, 16 . . . 31111106−63 . . . −32, 32 . . . 631111107−127 . . . −64, 64 . . . 12711111108−255 . . . −128, 128 . . . 255
In this table, the Y code is a bit pattern that identifies the size range of a particular DC coefficient. The “size” entry indicates the number of bits that follow the Y code to indicate the exact value of the coefficient. The magnitude range indicates the values represented by various bit patterns. In the example from above, the value to be encoded is −4. The fourth line in the table encompasses a value of −4, so the Y code bit pattern to be used is 101. The table indicates that a three-bit value follows this Y code. There are eight possible patterns of three bits, and there are eight values in the table that correspond to the eight patterns— −7, −6, −5, −4, 4, 5, 6, and 7. The bit patterns corresponding to these values are as follows:
TABLE 3Bit patternvalue000−7001−6010−5011−4100410151106
The bit pattern column in this table is simply the possible bit patterns in ascending order, and the corresponding values are the possible values in ascending order. Similar tables can be constructed for other lines in Table 2. From Table 3, the following bit pattern for a value of −4 is 011. The value stored in the file to indicate a DC coefficient of −4 is then 101 011. Thus six bits are required to represent this value.
By way of further example, a coefficient value of −1 would be represented by a bit pattern of 00 0, requiring only three bits. A coefficient value of 129 would be represented by a bit pattern of 1111110 10000001, requiring 15 bits. Because the DC coefficients tend to be small, most require only a small number of bits for representation, resulting in efficient storage of the DC coefficients in an image.
The JPEG specification provides for generating Table 2 algorithmically from a list of the number of codes of each size to be generated and an ordering of the categories represented by the codes. The list of the number of codes of each size is an array of 16 numbers, and the ordering is an array containing as many numbers as the total of the entries in the first list. The arrays needed to generate Table 2 are:    Code length counts: 0, 2, 3, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0    Ordering values: 1, 2, 0, 3, 4, 5, 6, 7, 8
Specifying these arrays to the JPEG circuitry or other engine in a camera or system serves to “configure” the circuitry or engine. The arrays may be specified for each image, and are stored in the resulting file with the image data so that the data may be reconstructed. The JPEG technique allows the arrays to be specified for each image so that the Huffman codes may be optimized for maximum compression if the programmer so desires. A computer program for generating the code tables from the arrays is given in Appendix A.
A table similar to Table 2 may be constructed for the chrominance channel DC coefficients of the image, using uses different generating values, and resulting in different Huffman codes for the values.
Coding of the AC coefficients is done differently than the DC coefficients. JPEG and MPEG also code the AC coefficients differently from each other. JPEG AC coding is discussed first below.
In a JPEG file, Huffman codes are assigned not just to the size range of the AC coefficient, but a combination of the coefficient size range and the number of zero coefficients preceding the non-zero coefficient. For example, a coefficient may have a value of 9 and follow a run of 3 zeros. This coefficient is said to have a run/size combination of 3/4. A coefficient with a value of −1 and following another non-zero coefficient would have a run/size combination of 0/1.
Each run/size combination is assigned a Huffman code. Each non-zero AC coefficient is represented in the resulting JPEG file by its proper Huffman code (indicating the number of zero coefficients preceding the non-zero coefficient and the relative size of the non-zero coefficient) and a set of following bits that specify the exact value of the coefficient. The following bits for the AC coefficients are as described for the DC coefficient in Tables 2 and 3.
A typical JPEG table for coding AC coefficients (analogous to Table 2 above for coding DC coefficients) is abbreviated below:
TABLE 4Run/sizeCode0/01010(Special end-of-blockcharacter)0/1000/2010/31000/410110/511010...1/111001/2110111/31111011...2/1111002/211111001...3/11110103/2111110111...
Combining tables 1, 2, 3, and 4 above, it is now possible to determine the JPEG bit pattern for the luminance values of the entire example pixel block:
TABLE 5(JPEG)CoefficientNumberRun/sizeValueBit pattern0 (DC)N/A−410101110/3−510001020/3−410001130/1100140/1100150/22011093/1−11110100End of Block1010
The JPEG specification provides for generating Table 4 algorithmically from a list of the number of codes of each size to be generated and an ordering of the categories represented by the codes, in the same way that Table 2 can be generated.
The arrays needed to generate Table 4 are:    Code length counts: 0, 2, 1, 3, 3, 2, 4, 3, 5, 5, 4, 4, 0, 0, 1, 125    Ordering values (in hexadecimal notation):     01, 02, 03, 00, 04, 11, 05, 12, 21, 31, 41, 06, 13, 51, 61, 07 22, 71, 14, 32, 81, 91, A1, 08, 23, 42, B1, C1, 15, 52, D1, F0 24, 33, 62, 72, 82, 09, 0A, 16, 17, 18, 19, 1A, 25, 26, 27, 28 29, 2A, 34, 35, 36, 37, 38, 39, 3A, 42, 44, 45, 46, 47, 48, 49 4A, 53, 54, 55, 56, 57, 58, 59, 5A, 63, 64, 65, 66, 67, 68, 69 6A, 73, 74, 75, 76, 77, 78, 79, 7A, 83, 84, 85, 86, 87, 88, 89 8A, 92, 93, 94, 95, 96, 97, 98, 99, 9A, A2, A3, A4, A5, A6, A7 A8, A9, AA, B2, B3, B4, B5, B6, B7, B8, B9, BA, C2, C3, C4, C5 C6, C7, C8, C9, CA, D2, D3, D4, D5, D6, D7, D8, D9, DA, E1, E2 E3, E4, E5, E6, E7, E8, E9, EA, F1, F2, F3, F4, F5, F6, F7, F8 F9, FA
In the above array of ordering values, the first hex digit in each entry indicates the run of zeros encoded by a particular Huffman code, and the second digit indicates the size (number of bits in) a number following the Huffman code for specifying the actual value of the coefficient. For example, a run of three zeros followed by a coefficient value of 1 (a run/size combination of 3/1 in Table 4) is represented by the hexadecimal value 31 in the above array.
A table similar to Table 4 may be constructed for the chrominance channel AC coefficients of the image, using different generating values, and resulting in different Huffman codes for the run/size combinations.
MPEG encodes the AC coefficients differently. Rather than assign Huffman codes to run/size combinations, MPEG assigns Huffman codes to common run/value combinations. That is, common combinations of the number of zeros preceding a non-zero coefficient and the actual value of the coefficient (not just its relative size) are assigned Huffman codes. There are a very large number of possible run/value combinations, so only the most common few dozen are assigned Huffman codes. A special escape sequence handles the occasional combination that is not in the default table.
The table of MPEG Huffman codes for various run/value combinations is abbreviated below:
TABLE 6Run/valueCode0/111s0/20100s0/300101s0/40000110s0/500100110s0/600100001s...1/1011s1/2000110s1/300100101s...2/10101s2/20000100s...3/100111s3/200100100s...End of block10
The last bit of each code, indicated by “s”, is a sign bit, with 0 indicating a positive value and 1 indicating a negative value.
Combining tables 1, 2, and 6 above, it is now possible to determine the MPEG bit pattern for the luminance values of the entire example pixel block:
TABLE 7(MPEG)CoefficientNumberRun/sizeValueBit pattern0 (DC)N/A−4101 01110−500100110120−4000011013011104011105020100093−1001111End of Block10
Clearly there is much commonality between making a JPEG image and making an MPEG I-frame. It is possible to create MPEG I-frames by creating JPEG images using dedicated circuitry in a camera, parsing the Huffman stream, and substituting the corresponding MPEG bit patterns. However, because the Huffman codes representing different DCT coefficients typically vary in length, the process of parsing the stream may be time consuming and inefficient when performed by a camera's microprocessor.
The dedicated JPEG circuitry or other engine in a camera typically does not allow the compression process to be interrupted before the Huffman coding of the AC coefficients so that a different coding method could be used for construction MPEG I-frames.
MPEG compression may be done without the aid of compression circuitry by a program running on a microprocessor that is part of a camera, but this method may be so time consuming that the camera user is dissatisfied. Dedicated circuitry could perform the MPEG compression quickly, but many cameras do not contain circuitry for constructing MPEG sequences, and such circuitry may be expensive.
There is a need for a method of using the JPEG circuitry or other engine in a camera or other imaging device to assist in the construction of an MPEG sequence by performing the processing steps common to both JPEG and MPEG, while allowing the remaining processing to be performed efficiently.