1. Field of the Invention
The invention relates to memory arrangements suited for use in multi-processor systems.
2. Description of the Related Technology
Highly parallel architectures achieve high performance by parallelizing computation. The high parallelism has to be supported by high data throughput because some computation need read data from or write results to the memory. To achieve high data throughput, memory organization of such architectures has to support multiple data accesses simultaneously. Otherwise, the performance of such architecture will suffer seriously.
One way to provide high memory bandwidth is designing a true multi-port memory (FIG. 1). It can read or write N (N=Number of port) data without any constraint at each cycle. When N is bigger than 2, however, the cost of such a memory system is prohibitively high. Area, delay and power will increase non-linearly with the increased number of ports. In practice, only up to dual-port SRAM is widely used. When targeting at embedded applications, where power and cost are key design metrics, a true multi-port memory is not a viable option.
An alternative way of providing multiple data accesses is to assemble several single-port memory banks to form a pseudo multi-port system (FIG. 2). This approach is much cheaper and faster compared with the true multi-port approach. Ideally, if all the data accesses at the same cycle go to different banks, it works like a true multi-port memory and is able to provide multiple memory accesses simultaneously. Unfortunately, in reality it is likely that several memory accesses go to the same bank at the same cycle, while a memory bank can only serve one memory request at a single cycle. The system has to been stalled to wait until all the memory requests are served. This is called memory conflict and can seriously reduce the performance.
One issue associated with the multi-bank memory organization is how address space is partitioned among multiple banks. The address space can be partitioned into several big pieces among banks, or the address space can be interleaved among banks. The first method is good when there are many independent data structures. Thus each data structure can be assigned to different banks to be accessed simultaneously without conflict. The second method performs well when there are only limited data structures but high bandwidth requirement within a data structure. This is the case for many multimedia and telecommunication applications. Therefore, the interleaved multi-bank memory is more often used. For example, TI's 64× series feature a Level-1 data cache that includes 8 memory banks to provide two ports of data access [TI Inc., “TMS320C64x Technical Overview”, www.ti.com, 2005]. The data ports and the memory banks are connected through an 8×2 full crossbar. By carefully arranging data layout, and statistically accessing 2 data out of 8 banks, the amount of memory conflict can be controlled at low level.