Memory access systems, such as an embedded memory system in a processor core, e.g., an ARM, allow a processor to read data from memory and write data to memory. The read access time of the core processor includes the time to read the data from memory, a short setup time to latch the read data to a destination register, and the propagation delay time associated with transferring the read data out of memory to a destination register. The write access time includes time for the core processor to write data to the memory and the propagation time to transfer the write data from the processor to the memory. Typically, more time is required to read data from a given memory than to write data to memory resulting in the read access setting the maximum frequency of operation for the system.
Conventional memory access systems are typically limited to one operation per cycle, e.g., a read or a write, and require that the operations complete within a single cycle of the system clock. These systems rely on starting the read and write operations at the same relative start time (coupled), e.g., the rising edge of the system clock. In such a design the read access time defines the minimum clock cycle period. The result is the read access time is limited to the single system clock cycle which constrains the size and the access time of the memory being used or the speed of the system clock. This often requires the memory to be partitioned into smaller, faster memory blocks (e.g., less dense memory).
Prior memory access systems and methods attempt to solve the problems associated with longer read access time in several ways. One is to simply allow two clock cycles for the read access to complete. This allows the system clock to run faster but can seriously impair processor throughput.
Other conventional memory access systems borrow time from the write cycle for the read operation. Often the most critical situation is a back-to-back read then write operation. In this design, the delay write operation is delayed until the read operation is complete which is often referred to as a “delayed write” design. Since the system is utilizing a single system clock and the back-to-back operation must complete within two cycles, the write delays must be derived from the single system clock. Another drawback of conventional systems is that since the read access actually extends into the next cycle, special handling of the read data is required downstream from the memory. The result is either latching data on the opposite edge of the clock from that starting the access (potentially impacting throughput), or pipelining the delivery of data, which adds latency and complexity to the memory and processor design.
Generally in common to conventional memory access systems are a single system clock, a control signal which contains information whether the data is to be read or written, and valid address values which specify the exact addresses to be read or written during the respective read or write operations. The memory read and write operations usually initiate with, or with timing derived from, the same system clock edge associated with delivery of the control and address information. This precludes starting the read operation before the system clock since all of the necessary information is not available at that time. However, some processors (e.g., ARM) have that information available prior to the arrival of the initiating system clock edge.