Compressed still images and compressed motion video utilizing novel digital display and processing technology are creating new opportunities in a wide variety of fields such as computer based video telephony and video conferencing, computer based instruction and entertainment and computer based mutimedia presentations to support advertising and point of sale applications. Raw or uncompressed digital video is unsuitable for these applications because of the extensive storage and communication bandwidth required. Estimates show that storage requirements are up to 1 Mbyte per image and communication bandwidth can be up to 100 Mbps. Still image and video compression significantly reduces the number of bits required to represent an image or video sequence by taking advantage of spatial and temporal redundancies in the images and also by taking advantage of the limitations of the human eye to perceive certain forms of distortion. Compression factors ranging from 2:1 up to over 1000:1 can be achieved depending on the application and the image quality required.
In order to obtain a certain degree of uniformity in this field, standards are being adopted for the presentation of compressed digitized video information. These standards present at least three approaches for formatting different types of digital data extracted and encoded from still and moving video images. For example, P*64K (CCITT Standard H.261) covers the presentation of video telephony data, MPEG (ISO) covers the presentation of still and moving image video data for CD-ROM applications, and JPEG (ISO) covers the presentation of video data for still picture storage and transmission. Digitized video data formatted in compliance with any of the standards comprises sequences of raw data interleaved with variable length encoded data. Interleaving occurs at intervals which appear random and are a function of the underlying data. Data block lengths can vary over a wide range. As a result, recovery of fixed length data sequences and reconstruction of the video images is a problem because one cannot locate with certainty the beginning of a raw (non-coded) data word or a coded data word until the prior variable length encoded data word is fully decoded.
Data recovery and image reconstruction are further complicated by the individual data structures which make up the formatted video image data. According to the P*64K standard for video telephony data, digitized video data are divided into categories for intraframe data, interframe data, and motion compensated data, with each category being classified into subcategories of quantized, unquantized, encoded, and not encoded data. Such an arrangement of data is said to be hierarchical. Each category of data is encoded with respect to a particular set of rules optimized for the particular category of data. For data recovery, data must not only be decoded correctly, but the context or category within which the data appear in the format must also be accurately determined in order to apply the correct decoding rules. Some data, which is received and decoded, provides the key to recovering subsequent data sequences. As a result, decoding of hierarchical data is context dependent.
For hierarchical data, decoding complexity is further increased because it is necessary to keep track of the position in the hierarchy. The decoding rules applied to one set of data in one position in the hierarchy can be very different from the rules applied to a subsequent set of data in the same block simply because the position in the hierarchy and, therefore, the context may have changed.
For recovering a fixed length video data sequence from the hierarchical data, variable length decoding is the most time consuming operation. It is well known that the bit level decoding decisions can be represented as branches on a tree in which intermediate nodes represent partially decoded data and leaf nodes represent a fully decoded data symbol. See, for example, a technical article by M. Wells in The Computer Journal, Vol. 15, No. 4, pp. 308-313 (1972). A simple Finite State Machine, using for example a ROM and a next state register, can be used to efficiently implement this variable length decoding decision tree. For this implementation, the decoder always returns to the origin or root node of the tree after successfully decoding an individual data symbol. Bit parallel entropy decoding, in which N bits of the input sequence (where N is sufficient to contain the longest variable length code) are applied in parallel to a lookup table, has been used to increase the decoding speed above performance constraints set by hardware technology such as CMOS VLSI circuits. But this bit parallel decoding technique requires additional circuitry, for example, high speed barrel shifters and large PLAs, which makes the implementation complex and somewhat inefficient. See, Sun et al., "High-Speed Programmable ICs for Decoding Variable-Length Codes", Proceedings of SPIE Applications of Digital Image Processing Vol. 1153, (1989) pp. 28-39. Word parallel entropy decoding, where two or more substantially identical decoders are employed, has also been proposed to increase decoding speed. In the three examples cited above, the decoders interpret the variable length data according to a single code book or decoding tree network.
In contrast for decoding the hierarchical data, it is necessary to decode one set of symbols in order to determine which code book or decoding tree network is needed to decode the next variable length sequence. When one symbol has been decoded, the decoding process must advance to a new point (context) in the decoding hierarchy according to data obtained during the decoding of the prior individual data symbol. Moreover, the decoding process must have sufficient flexibility to handle embedded non-coded data words. One such multi-function decoder was proposed in a technical article by Yang et al., Proc. of SPIE:, Visual Communication and Image Processing Vol. 1360, pp. 1530 et seq. (1990). In the reported decoder, the parallel entropy decoder described and cited above performs variable length decoding on fixed length input words while a plurality of different hardware modules are called upon to decode the hierarchical aspects of the incoming data words. The fixed length is the maximum length input codeword expected by any of the variable length decoders used for the hierarchical decoding process. Each decoder effectively performs a table look-up routine in the plurality of hardware modules with the fixed length input word to output a decoded word and the total number of bits used from the input word to obtain the decoded output word. The barrel shifter in the parallel decoder shifts out N bits of the input word before proceeding with the decoding process, wherein N corresponds to the total number of bits as output from the hardware modules. A de-formatting switch, implemented as a separate finite state machine is used to maintain context, select the appropriate hardware decoding module and switch data between the various modules. By implementing the required decoders as separate hardware modules and by requiring input words having, on average, more bits than are necessary to accurately decode a unique output word, this decoder implementation is complex and somewhat inefficient. Moreover, the decoder lacks the capability to decode incoming data on a bit-by-bit basis.
It should be noted that existing decoder implementations lack sufficient simplicity, speed or functionality to perform the necessary processing for decoding hierarchical data according to any of the standard formats cited above. As a result, realization of a simple decoder with the performance required to meet the intended applications, according to any one of the standards, has yet to be successfully addressed in the published literature.