A multiport memory with time-division multiplexed access is known from an article by T. Sazaki, T. Komiya, K. Takano, N. Oba, H. Kobayashi and T. Nakamura, titled xe2x80x9cTime Division Pseudo Multi-Port Register File with Wave Pipeliningxe2x80x9d and published in the Transactions of the Institute of Electronics Information and Communication Engineers, Volume J80 No. 3 (1997) pages 223 to 226. A similar circuit is known from an article titled xe2x80x9cPipelined, Time-showing Access Technique for an integrated Multiport memoryxe2x80x9d, by Ken-Ichi Endo, Tsuneo Matsumura and Junzo Yamada, published in the Journal of Solid State Circuits Vol. 26 No. 4 (Apr. 1991) pages 549-554.
Sazaki et al. describes the architecture of a pseudo multiport memory module. A multiport memory is a module that allows to store and retrieve data to/from a single memory core via separate ports. The ports of a real multiport memory are an integral part of the memory design. Typically, these ports have no inter-timing relations (in fact are independent, except that simultaneously reading and writing to the same address is forbidden).
A pseudo multiport (PMP) memory is different in that the memory function is implemented by a standard single port memory. The ports are simulated by successively accessing this memory in time-slots within a clock cycle. The data, address and control inputs for each of the ports are sampled at the rising edge of the clock input. The clock also triggers the sequence of memory accesses.
The circuit disclosed by Sazaki et al. uses successive periods of a 300 Mhz clock to define time-slots for access to the memory. There are three ports. A 100 Mhz clock is used to define access cycles to the ports. Endo et al. we both polaritus of clockedges to access memory.
In principle, the data from memory is needed only at the end of each 100 Mhz cycle, but it is available before the end of the cycle when results derived from memory access should be latched. The time interval between the time that the data becomes available and the end of the cycle is in particular larger from ports that are given access to the memory in the earlier parts of the 100 Mhz clock cycle. During this time interval the data might be passed through combinatorial logic circuitry. This could be used to speed up the circuit, especially in application specific circuits, where the memory is embedded in an integrated circuit designed for some application and such combinatorial logic can readily be designed into the device.
Amongst others, it is an object of the invention to increase the time interval between the time that the data is available from the memory and the end of the memory access cycle in which all ports are enabled to access the memory.
The device according to the invention and its embodiments are described in the Claims. By using handshaking to generate a time-slots for access to the memory, access to the memory is as faster than when the timeslot is generated as a cycle of a high frequency clock that operates at a frequency of N times the frequency at which the ports are accessed.
More time is left between the completion of access and the end of the clock cycle in which data becomes available from the memory because the time-slot is defined asynchronously. In the time that is left, the data from the memory can be used for combinatorial logic operations and the result of such operations can be stored at the end of the cycle. This speeds up the circuit. The delay produced by the combinatorial logic operations may be a considerable part of the clock cycle. If there are N ports, memory access starts at the beginning of a clock cycle and data is available at a port M time slots after the beginning of the clock cycle, then the delay of the combinatorial circuits may be more than (Nxe2x88x92M)/N of the clock cycle. This is because each time slot needs to take up less than 1/N of the clock cycle, and does not need to be 1/N of the clock cycle.
Internally in the memory, the handshaking may be used to generate an internal clock signal that is more than N times faster than the system clock, with N being the number of ports. This internal clock is used in the port shell as a time reference, in the sense of the synchronous design style. Thus, design and test of the memory may proceed as for synchronous circuits, which considerably simplifies design and test.
With the proposed architecture, a memory with more than ten ports can be realized. In practice, the upper bound of the number of ports is given by the ratio of the (system) clock period and the memory cycle time.
Compared to conventional multiport memory, this reduces the required silicon area and power consumption, at the expense of longer read access time for all but the first port. The silicon area is proportional to the number of ports (quadratically proproportional in the prior art). Power consumption hardly depends on the number of ports (linear in the prior art).
An asynchronous controller may be used for the local clock generation. Such a controller is very small: it has a size of approximately 20 gate equivalents. So the internal clock is locally generated without using a PLL. Instead, a ready signal from the memory is used to generate the next clock edge. This approach has three benefits:
it eliminated the PLL, thereby removing a source of IC failures.
it requires less area (approx. 10 times smaller)
it is easier to layout (no additional irregular blocks)
it saves power since the internal clock can be completely disabled when the system clock is gated
it gives the shortest access time since the successive memory accesses are maximally compressed.