1. Field of the Invention
The present invention relates to an apparatus useable for the real-time processing of video signals and other signals requiring digital processing.
2. The Background Art
With the rise in demand for increasingly complex electronic devices, such as computers performing video signal processing functions, it is often required that large quantities of data be manipulated by common operations, to lighten or darken an image, merge two images together, etc.
Signals requiring manipulation often contain a great deal of information, such as when an NTSC signal is being operated upon to produce various effects. A digital NTSC signal produces approximately 10.4 million pixels per second, and each pixel contains information for three colors, red, green, and blue. Thus, a digital NTSC signal results in more than 31 million pieces of digital data per second.
A typical computer CPU speed of 200 MHz having to process more than 30 million pieces of digital video data per second, results in less than seven CPU clock cycles available each second for processing each color component for each pixel. In order to provide a smooth transition during real-time video processing, it is necessary that processing of each frame be completed prior to its display. Thus, speed of processing is extremely important in real-time signal processing applications.
In modern day computers, memory operations are the slowest operations performed, with the typical read or store operation requiring a full clock cycle in which to be executed. With the number of operations being performed on a real time video signal as seen above, it is critical that even the most simple operations be performed in the most efficient manner. However, signal processing systems, though useful for their intended purposes, often do not optimize processing functions so that memory operations are minimized. It would therefore be beneficial to provide a signal processing system and methods for its use which perform specialized processing functions while minimizing the number of transfers into and out of memory.
In addition to limiting the number of memory operations performed, it is often desirable to reduce the number of discrete operations performed in a given period of time. Typically, repetitive operations are performed sequentially rather than in parallel. Performing signal processing operations on many bytes of simultaneously often reduces the amount of time required to perform those signal processing operations on the entire set of data as compared to the amount of time which might be required to perform those same operations when performed sequentially. It would therefore be beneficial to provide an apparatus and method which processes several bytes of data simultaneously.
In video processing operations, such as when decoding a video signal using the MPEG-II compression standard, or when scaling or rotating images, it is necessary to combine the incoming signal being processed with other internally generated signal data in order to achieve a given result. The present invention is only concerned with the difference of two signals when the overflow that may result is ignored, such as when subtracting the angular component of one vector from another.
In the prior art, 8-bit signal information from the incoming signal, such as pixel information, was subtracted from other 8-bit information, one 8-bit byte at a time. This method, while use for its intended purposes, utilizes the processor inefficiently, and fails to optimize memory operations.
In the discussion that follows, memory refers to any memory used by a Central Processing Unit to perform operations. The data being manipulated is typically real-time graphical image data, but may instead be any type of signal data requiring similar processing.
To illustrate the method of the prior art, a difference will be computed using a first set of four 8-bit bytes of signal data such as pixel information and subtracting from it a second set of four 8-bit bytes, resulting in a difference output.
FIG. 1 is a flow chart of the prior art method of computing a difference of one set of signal data from another set of signal data, each set comprising four 8-bit bytes each.
Referring to FIG. 1, at step 10, a data byte from the first group of signal data is loaded into memory. At step 12, a data byte from the second group of data is loaded into memory. Step 14 then subtracts the data byte in the second group from the data byte from the first group, ignoring any overflow condition. The method proceeds at step 16 when the processor causes the result of step 16 to be written into memory. The method continues at step 18 where it is determined if all four differences have been computed. If not, the method proceeds again with step 10. If yes, the method ends.
In order to compute the difference of the two signals as seen above, one memory operation was required for the loading of each data byte, for a total of eight load operations. A total of four subtraction operations were required to compute the differences, and a total of 4 store operations were required to store each result into memory. Since each memory operation takes approximately one clock cycle to complete, the entire method of computing the differences of the two signals using the prior art method requires a minimum of 16 operations being performed in no less than twelve clock cycles.