The general trend in the computer industry has been towards the development of computer products which integrate a variety of different communications media. An assortment of products and standards now exist which combine various media resources such as color video, digital audio, motion pictures, and computer graphics. Several examples of state-of-the-art integrated multimedia architectures and systems are found in U.S. Pat. Nos. 5,307,456, and 5,226,160. Communication systems and apparatus for efficient transmission of multimedia information are also described in U.S. Pat. Nos. 5,309,434 and 4,864,562.
One of the difficulties of processing and communicating multimedia information is the enormous amount of data that must be manipulated, and on which, numerous arithmetic operations must be performed. For example, a typical multimedia system must be capable of capturing, digitizing, compressing, decompressing, enhancing and reproducing moving video images and accompanying audio signals. The formidable problems which must be overcome in designing systems for storage, processing, communication, and retrieval of multimedia data has lead engineers and scientists to consider a wide variety of solutions.
To fully appreciate the magnitude of the problem faced by practitioners, consider the fact that digitized video images normally comprise a two-dimensional array of picture elements, or pixels. Standard video displays normally consist of an array of 640 horizontal pixels by 480 vertical pixel lines. Of course, the quality of the image is a function of its resolution, which is a direct function of the number of horizontal and vertical pixels. This means that the resolution of an image on a display area is directly related to the amount of memory required to store the image.
A typical full color video image requires 24-bits per pixel. This allows 8-bits to be assigned for each of the three primary colors (i.e., red, green, and blue) resulting in 2.sup.8 .times.2.sup.8 .times.2.sup.8 --or approximately 16.7 million possible colors in a single pixel. As a consequence, over one million bytes (1MByte) of memory is usually required to display a full color video image. If video motion is to be added to the display--for example in a NTSC video application--each video frame must be displayed 30 times a second. Therefore, to display 60 seconds of motion video on a computer monitor screen requires approximately two gigabytes (2 GBytes) of memory.
The rate at which data must be retrieved in order to display video motion sequences is on the order of 30MBytes per second. If digital audio is also included in the multimedia communications system an additional 180 kilobytes (KBytes) of data per second must be transmitted. These data transfer rates vastly exceed the capabilities of existing data storage devices.
For example, contemporary rigid disk drives are capable of transferring approximately 1MByte of data per second. Because multimedia information is extremely number intensive and involves the processing of enormous amounts of data at extremely high data transfer rates, various techniques have been developed aimed at compressing the data. A number of different compression techniques and algorithms are known to reduce the amount of data that needs to be manipulated or transmitted while maintaining a high standard of video/audio fidelity.
By way of example, an algorithmic technique developed by the Joint Photographic Expert Group (JPEG) has proven successful in reducing the amount of data by a factor of approximately twenty for still picture compression. The JPEG algorithm has a reputation for providing high quality still picture compression; that is, the JPEG algorithmic technique allows a significant reduction in the amount of data that is needed to represent a still, video image. The JPEG algorithm achieves this result by eliminating information to which the human eye is relatively insensitive.
Another algorithmic technique developed by the Moving Pictures Experts Group (MPEG) provides a very high level of compression for moving pictures and associated audio for digital storage at about 1.5 megabits per second. This technique is commonly known as the MPEG algorithm. The MPEG algorithm involves the compression of data which exhibits certain spatial and temporal relationships. An MPEG bit stream is made up of three types of pictures: Intra Pictures (I) which are coded independent of any of the other pictures, Predicted Pictures (P) which are coded using motion compensation from previous or future pictures, and Bi-directional predicted pictures (B) which are coded using motion compensation from a previous and future I or P pictures.
In comparison to the JPEG algorithm, which simply reduces the amount of data by recognizing differences between neighboring pixels, the MPEG algorithm takes into account relations and differences between successive video frames. The MPEG algorithm is therefore useful in compressing data involving the displacement of an object in successive video frames in relation to a pixel coordinate axis.
Yet another technique for video compression and decompression is known by practitioners in the art as H.261 (P.times.64).
Audio compression and decompression does not require as much bandwidth as video in most applications (e.g., teleconferencing or voice playback), but it is still important to be able to code and decode an audio or voice data stream in real-time. Various algorithms, such as sub-band coding, Linear Predictive Coding (LPC) and Wave Table Synthesis (WTS) have been studied for use in the compression, decompression and synthesis of audio information.
A common characteristic of these standard algorithmic techniques is that they are all "transform-based"; that is, they rely upon the use of the Discrete Cosine Transform (DCT). (MPEG and H.261 also use Motion Estimation/Motion Compensation to compress and decompress video data streams.) Most all of the algorithms and techniques for achieving high quality audio and video compression/decompression, graphics and image processing, handwriting and speech recognition, and modem communications have one very important common characteristic: they are extremely computationally intensive, involving extensive multiply-accumulate loops.
By way of example, the discrete cosine transform is frequently implemented in a computer as a sequence of multiplications and additions performed on the image data in one or more dimensions. Thus, the fundamental operations required for convolution and correlation of pixel data are matrix multiplication and addition. Other common characteristics of multimedia applications include: small native data elements (e.g., bytes, words, etc.), computationally intensive inner loops (e.g., DCT, Motion Compensation), regular data structures (e.g., one and two-dimensional pixel arrays), and execution of a small number of repetitive operations performed on large amounts of data (e.g., additions, multiplications, shifts). As a result, the long-felt, unsatisfied need in the field of multimedia computing has been for a data processing architecture that is capable of performing computationally-intensive operations, involving huge amounts of data, in real-time.
Conventional scalar microprocessors, such as those manufactured by Motorola, Inc., and Intel Corporation, perform matrix multiplications and matrix additions sequentially. By way of example, to perform a matrix multiplication operation traditional general purpose microprocessors normally first compute a pair of addresses, then fetch the individual data elements that are to be multiplied. The product of the matrix and vector elements is then stored in a temporary location on-chip. The multiplication process is repeated for each matrix and vector element until the entire matrix multiplication has been completed.
Attempts to reduce the number of multiplications and additions that must be performed on video/audio data during the compression and decompression processes has resulted in a variety of specialized circuits and processing techniques. For instance, U.S. Pat. No. 5,053,985 describes an integrated circuit that attempts to optimize the physical mathematical apparatus required to perform discrete cosine transforms commonly used in the compression of digital image data. A computer system for compression and decompression of video data using DCT algorithmic techniques is also disclosed in U.S. Pat. No. 5,191,548.
Another approach to the problem of achieving real-time video processing speeds has been the concept of single-instruction, multiple-data (SIMD) processors. SIMD machines are generally characterized as consisting of an group of processors that perform the same operation simultaneously on every element of a data array. Multiprocessors that employ additional instruction memories dedicated as cache memories to particular processors to allow processors to function in a multiple-instruction, multiple-data (MIMD) mode have also been utilized to process large data arrays. Examples of SIMD and MIMD architectures for use in digital filtering applications are described in U.S. Pat. Nos. 5,210,705, 5,239,654 and 5,212,777. Despite the fact that SIMD and MIMD processors are capable of manipulating large amounts of data rapidly, they generally suffer from the drawback of having relatively slow input/output (I/O) speeds. In addition, many existing SIMD and MIMD architectures are not generalized enough to implement all of the various algorithms and techniques that need to be efficiently executed in a multimedia data processing system.
Computer architectures that includes a plurality of pixel slice processors are also known in the prior art. For example, U.S. Pat. No. 4,860,248 describes a pixel slice processor that accomplishes SIMD operation on pixel lines which are equal to or larger than the bit capacity of a computer. Traditional bit-slice architectures, as exemplified by the Advanced Micro Devices 2900 family of processors, allow some data processing to occur on a nibble granular basis (e.g., on a multiple of 4 bits) but suffer from the inability to work across multiple data words. Moreover, existing bit-slice architectures usually perform mathematical operations sequentially, rather than in parallel. Another serious drawback of existing pixel or bit-slice architectures is the complex interconnection network and control structure that is required.
Convolution memory chips, such as that described in U.S. Pat. No. 5,014,235, have demonstrated their usefulness in performing dot product matrix multiplications. Dot product matrix multiplications are routinely performed in certain types of neural networks or associative memories for the purpose of matching sets of data (e.g., pattern recognition). The problem with the use of convolution memory chips, however, is that they are not of general useful in multimedia, real-time data processing applications that require implementation of a number of different processing algorithms at very high data rates.
What is needed then is a computer architecture that takes advantage of the statistical similarities of multimedia data and which can rapidly perform a variety of repetitive mathematical operations on very large data sets for the purpose of finding similarities between different functions. As will be seen, the present invention provides an apparatus and method for efficient processing of multimedia data. The invented data processing system is capable of calculating very complex functions on large data arrays at extremely high data rates. Moreover, the invented architecture is very flexible and adaptive so as to implement a wide variety of different processing algorithms and techniques useful in multimedia applications.