1. Field of the Invention
The invention relates in general to the field of processors having multiple processing elements, and in particular to an improved interface for providing input/output operations in such processors.
2. Related Art
Single Instruction Multiple Data (SIMD) processors have been known for use in applications such as image processing. SIMD processors have a number of individual Processing Elements (PE's) which each execute the same instruction upon different data. Systems which incorporate SIMD processors implement an Input/Output (I/O) architecture which allows them to move data to and from the PE's.
The prior art includes SIMD processors having an array of bit-serial PE's connected in a grid fashion, with working memory on-chip. The design of the PE's is often kept simple in order to maximize the number of them that can be placed on a chip. This simplicity allows systems incorporating such chips to achieve high PE densities, but requires systems to surround the chips with resources dedicated to support of their simple I/O architecture.
Such chips of the prior art often support I/O operations with the outside world through a communications bus. This bus consists of a unique data transmission path for each column of PE's in an array (an array can consist of a single chip, or multiple chips connected together). Thus, an array with X columns and Y rows of PE's would have a communications bus with a total of X paths.
The PE's of such prior art chips each ordinarily have a communications register which they can read and write. Shift operations cause the contents of each PE's communication register to move along the communications bus to the communications register of the PE above it. Each path in the communications bus can transmit a single bit of data at a time. Data is loaded into the array at the bottom edge one row at a time, while data is unloaded from the array at the top edge one row at a time.
For an array with X columns and Y rows a total of Y shifts of the communications bus are required to load or unload a single bit of data from all PE's in the array. If an input or output consists of N bits of data for each PE, a total of Y*N shifts are required. The natural format of data for I/O operations with prior art arrays is in bitplanes (i.e. bit zero of the entire image, followed by bit one of the image, etc.) supplied one row at a time.
Prior art SIMD arrays process and communicate data most naturally in bitplanes, but the systems which incorporate such arrays often process and communicate data most naturally in pixel format (i.e. all bits of a pixel grouped together). This mismatch requires additional hardware and/or software to implement the interface between an array and the system in which it resides.
When data is ready for processing in a bit-serial SIMD array all bits associated with a pixel are stored in the memory of a single PE. The bits of the pixel can be viewed as being stacked one on top of another in the PE's memory; this format has accordingly been known as "vertical format."
If pixel format data were shifted into an array with no preprocessing the bits associated with the pixel would be distributed across several PE's in a row-wise direction (because the array shifts data in a row at a time). The bits of the pixel can be viewed as being side-by-side in the memory of several PE's, and therefore this format has been known as "horizontal format."
The process of converting data from horizontal format to vertical format or vice versa is known as "cornerturning." Systems which incorporate a PE array ordinarily cornerturn inputs to the array before they can be processed, and cornerturn outputs from the array before they can be used by the system. Heretofore, approaches to cornerturning involve the use of hardware external to the array to accomplish a portion of the cornerturning process, and software within the array to accomplish the rest.
There are a number of drawbacks to prior art approaches to SIMD processor I/O. Typical systems which incorporate PE arrays have one or more of the following components which exist solely for the purposes of interfacing with the array: input staging memory, input fifo, recirculation fifo, output fifo, paging buffer, output staging memory, and address generator. These interface components add complexity to the system design and negatively impact such critical system factors as power consumption, heat production, weight, and size. For small arrays, the amount of board space required for interface components can exceed the space required for SIMD processor chips.
Other approaches to solving the interface problem involve less hardware and more software, or more hardware and less software. Approaches which rely more on hardware add components and complexity at the system design level, while providing minimal gains in performance. Approaches which rely more on software often cannot eliminate the majority of interface components, and suffer from setbacks in system performance.
In addition to the interface problems discussed above, SIMD processors of the prior art face limitations regarding their paging buffers. Paging buffers are memory components external to the array which are used to store temporarily the results of intermediate calculations when PE working memory runs low. Use of this memory is called "off-chip paging." Off-chip paging has significant performance penalties because the bandwidth between the array and the paging buffers is slow when compared with the bandwidth between the array and its working memory. Bandwidth rates can easily differ by more than two orders of magnitude, which translates into a tremendous penalty for off-chip paging.