A one-chip large-scale integrated circuit (LSI) containing a 32-bit microcomputer for controlling devices has been developed for built-in applications in the fields of digital and network appliances.
In the following description, the microprocessor part of the LSI will be called “microcomputer core.”
In the field of network appliances, memory protection are becoming more important as the size of programs for implementing computer processing services are becoming larger and the programming environment are changing due to the installation of closed program modules or the installation by downloading program modules.
Therefore a microcomputer core includes a memory management unit (MMU) using a Translation Look-aside Buffer (TLB) as will be described below in order to support the implementation of memory protection capability. In the MMU implementation, the parallel execution of a cache access and a TLB search operation is accomplished in one machine-cycle by optimizing circuitry.
The basic operation of a cache will be described below.
FIG. 5 shows busses of the cache of a microcomputer core. While the cache is divided into an instruction cache 1 and a data cache 2 for processing an instruction access and a data access in parallel, the operations of these caches are the same. The operation of the data cache 2 will be described herein as an example.
The flow of an address signal for a memory access is as follows.
A central processing unit (CPU) core 3 accesses the data cache 2 through a bus interface (hereinafter called “BCIF”) 4.
During a read operation, a virtual address output from the CPU core 3 is input into a data TLB 5 through the BCIF 4.
If a physical address corresponding to the virtual address is in the data TLB 5, the data TLB 5 outputs the physical address 6 and a hit signal as a hit/miss signal 7. Otherwise, it outputs a miss signal.
If the hit signal is output from the TLB 5, the physical address output from the TLB 5 is compared with tags (cache memory indices) in the data cache 2. If there is a match, the data corresponding to the physical address is output onto a data bus and the data and the hit signal is input into the CPU core 3 through the BCIF 4. The size of the output from the data cache 2 is 64 bits if it is data, or 32 bits if it is an instruction.
The steps of a write operation are the same until the output of a hit signal from the data cache 2. After that, instead of outputting data onto the bus, data which has been output from the CPU core 3 onto the bus precedently is written into the data cache 2.
The cache operation will be detailed below.
FIG. 6 shows a configuration of the data TLB 5 and the data cache 2.
A virtual address output from an address generator 8 in the CPU core 3 is input into the data TLB 5 through the BCIF 4.
The virtual address is compared with tags at TAG 5a. If there is a physical address corresponding to the virtual address, the high-order address of the physical address and a hit signal is output. Otherwise, a miss signal is output. If the physical address corresponds to protected memory, an exception signal is output and no data is output from the data cache 2.
On the other hand, because the low-order address of the virtual address is the same as that of the physical address, the low-order address is also input into the data cache 2 at the same time.
The data cache 2 has a TAG memory module 9 and cache data memory module 10. If there is an address corresponding to the low-order address in the TAG memory module 9 of the data cache 2, the high-order address of the physical address corresponding to the lower address is output.
If a hit signal is output from the data TLB 5, the high-order address of the physical address output from the data TLB 5 is compared with the TAG memory module 9 of the data cache 2 at 2a. 
If there is a match, data corresponding to the address is output from the cache data memory module 10 onto the data bus and a hit signal is provided to the CPU core 3.
If no hit signal is output from the data TLB 5, or no hit signal is output from the data cache 2, a miss signal is output to the CPU core 3.
If an exception signal is output from the data TLB 5, no data is output from the data cache 2, instead, exception management is performed by the CPU core 3.
The steps for a write operation are the same as the steps described above until the output of the hit signal from the data cache 2. After the hit signal is output, instead of outputting data onto the bus, data which has been output from the CPU core 3 onto the bus precedently is written into the data cache 2. If an exception signal is output from the data TLB 5, data is not written into the data cache 2. Instead, exception management is performed by the CPU core 3.
In this way, part of the virtual address-physical address translation at the data TLB 5 and part of the match finding in the cache control are performed concurrently in order to increase the speed of cache operations.
Thus the cache operations can be performed within one cycle. Access latency can be reduced by eliminating the accesses to main memory using the cache memory especially when an arithmetic operation which requires memory read/write operations is performed in a number of cycles.
FIG. 7 shows access timing during cache read operation. If a miss signal is output, operation in the cycle halts at that point. FIG. 8 shows access timing during a cache write operation (when exception management is OK). FIG. 9 shows access timing during a cache write operation (when exception management is NG).
The operation time is the sum of time required for “TLB TAG comparison”, “TLB data read”, “cache TAG comparison”, “cache hit signal output”, and “cache data output.”
In order to achieve faster operation (reduce the amount of time by one machine cycle, or one cycle clock), the amount of time required for each of these steps should be reduced.