The invention relates to a computer memory arrangement, comprising a first plurality of input ports that are collectively coupled through a first router facility to selectively feed a second plurality of memory modules. Present-day computing facilities such as Digital Signal Processors (DSP) require both a great processing power, and also much communication traffic between memory and processor(s). Furthermore, ideally, both of these two performance aspects associated to the number of memory modules and processors, respectively, should be scalable, and in particular, the number of parallel data moves should be allowed to exceed the value of 2.
As long as a scale of 2 were sufficient, a possible solution would be to have two fully separate and fully functional memories, but then the selecting for a storage location between the two memories represents a complex task. Obviously, the problem will be aggravated for scale factors that are higher than 2. Moreover, the programs that will handle such separate storage facilities will often fall short in portability, such as when they have been realized in the computer language C. Therefore, in general a solution with a “unified memory map ” will be preferred. In practice, each memory access is then allowed to refer to any arbitrary address.
A realization of the above with two-port memories is quite feasible per se, but extension of the number of ports above two is generally considered too expensive. Therefore, the providing of specific hardware configurations on the level of the memory proper is considered inappropriate. Now on the one hand, the providing of the stall signal represents a useful feature. On the other hand, the reducing of the stall penalty overhead associated with conflict resolving in such pseudo-multiport memories requires that an additional slack time is added to the read-write access latency of the pseudo-multiport memory. In many applications indeed, the resulting extended access latency has no severe impact on loop-pipeline parts of an application, where throughput is important, but latency is generally insignificant. However, for certain other applications, the impact of the slack on the performance of running control-dominated code can be very significant. Such applications may inter alia relate to compression and decompression of code strings. Therefore, the present inventor has recognized the need for having the access latency be programmable to allow for selecting the optimum setup for the actual application.