Generally, parallel processing computers are used in high-speed data processing applications wherein multiple data streams are individually processed by separate processors. Since each processor executes the same instruction upon different data, such a parallel processing computer is known as having a single instruction, multiple data stream (SIMD) architecture.
One illustrative application of a SIMD computer system is a system for displaying video signals in "real-time" upon a video monitor. Specifically, such s a system contains a plurality of processors, local memory connected to each processor, and input/output circuitry interconnecting the processors. Also, the data (digital video information) entering the processors is buffered (input buffering) and the data exiting the processors is buffered (output buffering). Such buffering ensures that the processing rate of the processors is compatible with the input and output data rates. The data buffering is typically accomplished by temporarily storing the input and output data in the local memory associated with each processor.
Each processor within a real-time video signal processing system is typically assigned the task of generating a pixel value within each line (scanline) of a video display. Therefore, the number of parallel processors is equivalent to the number of pixels in a line of video information. Typically, each line of a video display is generated at approximately a 15 kHz rate. Thus, each processor must generate a pixel value at approximately a 15 kHz rate. As such, the plurality of parallel processors simultaneously produce all the pixels in one scanline. For example, if there are 512 pixels contained in a line, then 512 parallel processors are used. Each line of pixel values is temporarily stored (buffered) in a frame buffer until an entire frame of pixels is accumulated, e.g., 525 lines represents one frame. More specifically, each processor stores its computed pixel values in its local memory. Cumulatively, all the memory locations taken together form a distributed frame buffer. Once a frame of data has been accumulated in the distributed buffer, the frame of pixel data is sent to output circuitry that formats the data in a manner that can be utilized by a video display monitor.
To reiterate, the frame buffer is typically distributed across the local memory, i.e., the pixel values produced by a given processor are stored in local memory associated with that processor. Similarly, if the input data needs buffering before the processor can use it, the input data is distributed amongst the processors and also buffered in the local memory. Thus, an input serial data stream is processed to distribute the stream amongst the distributed input data buffers, i.e., within specific, pre-defined memory locations in the local memory associated with each processor. Furthermore, to facilitate data display, the output data that is distributed amongst the processor local memories is reorganized into a serial output data stream.
In operation, input data (pixel values) is supplied from an input device, e.g., a digital video camera, and that data is distributed amongst the processor local memories (distributed input buffering). In response to SIMD instructions contained in a program executed by each of the processors, each processor performs a specified computation upon the input data stored in each processor's local memory. For example, in an image processing context, such processing may include filtering, image rotation and scaling, image projections on a viewing plane, and the like. The result of the computation (a pixel value to be displayed) is buffered in the processor's associated local memory (output buffering) until an entire frame of pixel values has been computed and stored. Thereafter, the frame of pixel values is serially transferred, via computer system input/output circuitry, to an output circuit. The output circuit appropriately formats the output data for use by the video display device.
To achieve "real-time" processing, the system must compute the pixel values at approximately 15 kHz and produce each video frame at a rate of 30 flames per second. Thus, given a processor clock speed and assuming that the processor executes one instruction per clock cycle, a maximum number of instructions per pixel computation can be calculated. For example, if the processor clock speed is 14.316 MHz and the line rate is 15.7343 kHz, then 14.316 MHz/15.7343 kHz results in an instruction budget of approximately 910 instructions per pixel computation. Thus, for a given clock speed, there is a maximum number of instructions that can be executed to maintain real-time processing. If more instructions than are budgeted are necessary to produce a pixel, then the processors cannot update the display in real-time.
More specifically, the following pseudo-code represents the foregoing process for generating a display as accomplished in the prior art.
______________________________________ Program COMTIME routine { Get input data from input circuitry Write output data to output circuitry } Execute MAIN routine to produce output data RESTART routine { Execute NOPs until a horizontal sync pulse occurs, then restart this program } } ______________________________________
The foregoing pseudo-code is executed by each processor after each horizontal synchronization pulse occurs. As is well-known in the art of video signal display technology, the horizontal synchronization pulse represents when new line of pixel information needs to be generated for use by the display. Thus, upon occurrence of the horizontal sync pulse, the program first executes the COMTIME routine. This code retrieves data from the input circuitry and places that data into local memory (distributed input buffer) and also transfers output data from the local memory (distributed output buffer) to the output circuitry. This output data was generated upon the previous pass through the MAIN routine portion of the program. Once the COMTIME routine is complete, each processor executes a series of instructions comprising the MAIN routine. To process and display the data in real-time, the number of instructions in the MAIN routine must be less than the instruction budget. The MAIN routine, executing on each processor, uses the input data in the input buffer to compute output data (a pixel value). The output data is temporarily stored in the output buffer. Cumulatively, all of the data generated by the processors after a single pass through the MAIN routine forms a single scanline of video display information.
Once the MAIN routine computes the output data, the routine executes the RESTART routine which causes the processor to execute no-operations (NOPs) until the next horizontal synchronization pulse occurs. Generally, NOPs are used to fill processing time that is not used because a given processor has completed its processing within the instruction budget.
When the horizontal sync pulse occurs, the program counter for all the processors is reset to the beginning of the program and the program is executed again to produce the next scanline of pixel values. By repetitively executing the program a frame of pixels is generated and stored, one scanline at a time, in the distributed output buffer. Using the COMTIME routine, the entire frame, once generated, is transferred, one scanline at a time, from the output buffer for display. As such, the display is updated in real-time, i.e., the display is refreshed with new data at a 30 frame per second rate.
For computer systems that cannot generate a pixel value within the instruction budget, the display is refreshed using old data (previously computed data) until a frame of new data is computed. In such systems, frame buffering is used to store one or more flames of output data. The present frame being displayed is repetitively displayed until the computer system produces a frame of new data. Then, the new data is repetitively displayed until the computer system produces the next frame of data, and so on. As such, a new frame of data is not available at the standard frame rate of the display device, e.g., 30 frames per second. To implement this method, a special monitoring routine is added to the pseudo-code above to monitor when the system has generated each complete frame of data. This routine then informs the output circuitry of the availability of the new frame of data. Such a routine consumes processing cycles that could better be used for data processing.
Detrimentally the present methods of accessing distributed frame buffers in parallel processing computers must be either accomplished within a strict instruction budget or must include a routine for monitoring frame readiness.
Therefore, a need exists in the art for a method and apparatus for accessing a distributed data buffer in a parallel processing computer that does not impact the nature and function of a MAIN routine that accomplishes data processing.