This invention relates to local addressing for a register file memory. More particularly, this invention relates to local addressing for a register file memory in a single instruction multiple data (SIMD) parallel processor.
A basic computer generally includes a central processing unit (CPU) and a main memory. The CPU implements a sequence of operations encoded in a stored program. The program and data on which the CPU acts is typically stored in the main memory. The processing of the program and the allocation of main memory and other resources are controlled by an operating system. In operating systems where multiple applications may share and partition resources, the computer's processing performance can be improved by partitioning main memory and developing active memory.
Active memory is memory that processes data as well as stores data. It can be told to operate on its contents without transferring those contents to the CPU or to any other part of the system. This is typically achieved by distributing parallel processors throughout the memory. Each parallel processor is connected to the memory and operates on its own block of the memory independently of the other blocks. Most of the data processing is performed within the active memory and the work of the CPU is reduced to the operating system tasks of scheduling processes and allocating system resources and time.
A block of active memory typically consists of the following: a block of memory (e.g., dynamic random access memory (DRAM)), an interconnection block, and a memory processor (processing element array). The interconnection block provides a path that allows data to flow between the block of memory and the processing element array. The processing element array typically includes multiple identical processing elements controlled by a sequencer. Processing elements are generally small in area, have a low degree of hardware complexity, and are quick to implement, which leads to increased optimization. Processing elements are usually designed to balance performance and cost. A simple more general-purpose processing element will result in a higher level of performance than a more complex processing element because it can be easily copied to generate many identical processing elements. Further, because of its simplicity, the processing element will clock at a faster rate.
A system in which numerous identical processing elements (e.g., in the hundreds or thousands) operate under the control of a single sequencer and are closely connected to memory is known as a single instruction multiple data (SIMD) parallel processor. Memory is generally partitioned so that each processing element has access to its own block of the memory. As a result, all processing elements can execute the same instruction concurrently on different pieces of data.
Each processing element has a certain amount of local autonomy that allows each processing element to make data dependent decisions. With early SIMD parallel processors, each processing element can determine whether to write a result to its particular block of memory. With an 8-bit SIMD parallel processor, additional locally-enabled functions have been permitted, including conditional shifting and result selection within each processing element. These additional locally enabled functions are particularly useful for operations such as floating point arithmetic and multiplies.
Other SIMD parallel processors have also allowed a more complex and powerful form of local autonomy: the ability of each processing element to generate its own local memory or register file address. There are penalties associated with this form of local autonomy. For instance, a locally addressed access to memory is generally slower than a global centrally addressed access. However, the access time penalty can be minimal compared to the savings in the overall execution time of a program. Another penalty for implementing local addressing is the additional hardware needed for each processing element to generate and deliver an address to its own block of memory. The area and cost overhead is typically very high and thus many SIMD parallel processors do not implement local addressing.
In view of the foregoing, it would be desirable to provide a register file memory with partial local addressing while minimizing the increase in hardware complexity and cost.