The present invention relates to a semiconductor integrated circuit and in particular to a one-chip large-scale integrated circuit having cache capability.
A one-chip large-scale integrated circuit (LSI) containing a 32-bit microcomputer for controlling devices has been developed for built-in applications in the fields of digital and network appliances.
In the following description, the microprocessor part of the LSI will be called xe2x80x9cmicrocomputer core.xe2x80x9d
In the field of network appliances, memory protection are becoming more important as the size of programs for implementing computer processing services are becoming larger and the programming environment are changing due to the installation of closed program modules or the installation by downloading program modules.
Therefore a microcomputer core includes a memory management unit (MMU) using a Translation Look-aside Buffer (TLB) as will be described below in order to support the implementation of memory protection capability. In the MMU implementation, the parallel execution of a cache access and a TLB search operation is accomplished in one machine-cycle by optimizing circuitry.
The basic operation of a cache will be described below.
FIG. 5 shows busses of the cache of a microcomputer core. While the cache is divided into an instruction cache 1 and a data cache 2 for processing an instruction access and a data access in parallel, the operations of these caches are the same. The operation of the data cache 2 will be described herein as an example.
The flow of an address signal for a memory access is as follows.
A central processing unit (CPU) core 3 accesses the data cache 2 through a bus interface (hereinafter called xe2x80x9cBCIFxe2x80x9d) 4.
During a read operation, a virtual address output from the CPU core 3 is input into a data TLB 5 through the BCIF 4.
If a physical address corresponding to the virtual address is in the data TLB 5, the data TLB 5 outputs the physical address 6 and a hit signal as a hit/miss signal 7. Otherwise, it outputs a miss signal.
If the hit signal is output from the TLB 5, the physical address output from the TLB 5 is compared with tags (cache memory indices) in the data cache 2. If there is a match, the data corresponding to the physical address is output onto a data bus and the data and the hit signal is input into the CPU core 3 through the BCIF 4. The size of the output from the data cache 2 is 64 bits if it is data, or 32 bits if it is an instruction.
The steps of a write operation are the same until the output of a hit signal from the data cache 2. After that, instead of outputting data onto the bus, data which has been output from the CPU core 3 onto the bus precedently is written into the data cache 2.
The cache operation will be detailed below.
FIG. 6 shows a configuration of the data TLB 5 and the data cache 2.
A virtual address output from an address generator 8 in the CPU core 3 is input into the data TLB 5 through the BCIF 4.
The virtual address is compared with tags at TAG 5a. If there is a physical address corresponding to the virtual address, the high-order address of the physical address and a hit signal is output. Otherwise, a miss signal is output. If the physical address corresponds to protected memory, an exception signal is output and no data is output from the data cache 2.
On the other hand, because the low-order address of the virtual address is the same as that of the physical address, the low-order address is also input into the data cache 2 at the same time.
The data cache 2 has a TAG memory module 9 and cache data memory module 10. If there is an address corresponding to the low-order address in the TAG memory module 9 of the data cache 2, the high-order address of the physical address corresponding to the lower address is output.
If a hit signal is output from the data TLB 5, the high-order address of the physical address output from the data TLB 5 is compared with the TAG memory module 9 of the data cache 2 at 2a. 
If there is a match, data corresponding to the address is output from the cache data memory module 10 onto the data bus and a hit signal is provided to the CPU core 3.
If no hit signal is output from the data TLB 5, or no hit signal is output from the data cache 2, a miss signal is output to the CPU core 3.
If an exception signal is output from the data TLB 5, no data is output from the data cache 2, instead, exception management is performed by the CPU core 3.
The steps for a write operation are the same as the steps described above until the output of the hit signal from the data cache 2. After the hit signal is output, instead of outputting data onto the bus, data which has been output from the CPU core 3 onto the bus precedently is written into the data cache 2. If an exception signal is output from the data TLB 5, data is not written into the data cache 2. Instead, exception management is performed by the CPU core 3.
In this way, part of the virtual address-physical address translation at the data TLB 5 and part of the match finding in the cache control are performed concurrently in order to increase the speed of cache operations.
Thus the cache operations can be performed within one cycle. Access latency can be reduced by eliminating the accesses to main memory using the cache memory especially when an arithmetic operation which requires memory read/write operations is performed in a number of cycles.
FIG. 7 shows access timing during cache read operation. If a miss signal is output, operation in the cycle halts at that point. FIG. 8 shows access timing during a cache write operation (when exception management is OK). FIG. 9 shows access timing during a cache write operation (when exception management is NG).
The operation time is the sum of time required for xe2x80x9cTLB TAG comparisonxe2x80x9d, xe2x80x9cTLB data readxe2x80x9d, xe2x80x9ccache TAG comparisonxe2x80x9d, xe2x80x9ccache hit signal outputxe2x80x9d, and xe2x80x9ccache data output.xe2x80x9d
In order to achieve faster operation (reduce the amount of time by one machine cycle, or one cycle clock), the amount of time required for each of these steps should be reduced.
FIG. 10 shows a chip layout of a prior art.
While only a data cache will be illustrated and described below as an example, the same applied to an instruction cache as mentioned earlier.
A TLB bus input 11 connects a TLB TAG module 12 comprising a TLB TAG 12a and its I/O 12b in a data TLB 5 with the BCIF 4 mentioned earlier. The TLB TAG 12 is memory containing address translation data.
The TLB data memory module 14 of the data TLB 5 comprises a TLB buffer 14a and its I/O 14b. The TAG memory module 9 of the data cache 2 comprises a cache TAG 9a and its I/O 9b. The cache TAG 9a is memory containing cache indices.
The I/O 14b of the TLB module 14 and the I/O 9b of the TAG memory module 9 is connected by a TLB bus output line 13.
A cache data memory module 10, which is memory containing cache data, comprises cache data memory 10a and an I/O 10b. A hit signal of the TAG memory module 9 is input in the I/O 10b of the cache data memory module 10 from the I/O 9b of the TAG memory module 9.
A cache bus 15 connects to a CPU core 3 through the BCIF 4 and connects to an external bus 17 through a bus control unit (BCU) 16 shown in FIG. 5.
In the prior-art chip layout, the modules 12, 14 of the data TLB 5 and the modules 9, 10 of the data cache 2 are designed as separated modules and the wiring between the modules is provided subsequently, entailing a long line length.
Generally, a wiring delay is expressed by 0.4 * R * C (where, R is wire resistance and C is wire capacitance) and a longer line length will provide larger R and C.
The propagation delay time between the TLB and the cache TAG, or the propagation delay time through the xe2x80x9cdata readxe2x80x9d line length to the cache TAG to the cache data memory module, that is, the xe2x80x9ccache hit signal outputxe2x80x9d line length, prevents faster operation.
In particular, assuming that the width of the TLB data memory module 14 is L, the width of the TAG memory module 9 is 2L, the width of the cache data memory module 10 is 4L, and the width of the bus area 18 is A, the line length of the TLB bus output 13, that is, the maximum length of the TLB bus data read line will substantially be:
T1=A+L+A+A+L+2L+A=4A+4L.
The object of the present invention is to provide a semiconductor integrated circuit with a chip layout that allows faster processing by reducing the line length compared with prior-art chip layouts.
Because virtual addresses are used as the addresses of recent central processing units, address translation is required between a virtual address and the real address of cache memory or main memory. Therefore a translation table becomes larger as address space expands. Typically, the table is organized hierarchically. Because it takes much time to retrieve a real table by referencing the hierarchically organized table, a table called a TLB (hereinafter called xe2x80x9cTLBxe2x80x9d) which has an association capability is provided parallel with the hierarchical table to retrieve the real address faster. Thus the TLB should allow address translation to be performed at high speed as well as with high accuracy by means of small-sized circuitry.
Even though line width, line spacing, line length and line thickness have decreased as semiconductor integrated circuits have become miniaturized, data cannot be output faster because, assuming that the same line material is used and the scaling factor is xe2x80x9cSxe2x80x9d, the wire resistance R will increase by a factor of S and the wire capacitance will decrease by a factor of S, resulting in the same delay product of R*C.
There is another problem. Letting the capacitance of the layer under the wiring be Cb, the capacitance between lines be Cs, and the wire resistance be R, the wiring delay of the output of the data memory module is expressed by R (Cb+2xc3x97Cs). However, when the main output and an adjacent signal output change in opposite directions, the adjacent capacitance Cs seems to be two times larger. Therefore the maximum delay will be R (Cb+2xc3x972xc3x97Cs), which is slower than normal states. Furthermore, if delay time is reduced by providing a larger driver to increase the instantaneous current of a transistor, a supply voltage drop may occur when all data busses undergo changes. The voltage drop increases the delay. In addition, if signals on lines in layers below and above a data bus make a transition while the bus is in hold state, its output value may be inverted, which cannot be controlled by the resistance in the hold state of the bus.
It is an object of the present invention to provide a semiconductor integrated circuit that allows substrate noise and noise from other signal lines which affect a data bus to be reduced and faster data output operation to be achieved.
A semiconductor integrated circuit according to claim 1 of the present invention includes a cache capability provided by a Translation Look-aside Buffer (TLB) and a cache, wherein the cache comprises a TAG memory module and a cache data memory module, the cache data memory module is divided into first and second cache data memory modules which are disposed on both sides of the TAG memory module; and input/output circuits of the TLB are opposed to an input/output circuit of the TAG memory module and input/output circuits of the first and second cache data memory modules across a bus area.
A semiconductor integrated circuit set forth in claim 2 of the present invention is the one according to claim 1, wherein each of the TAG memory module and the first and second cache data memory modules is further divided into two and the divided modules are disposed on both sides of the TLB.
A semiconductor integrated circuit set forth in claim 3 of the present invention includes a cache capability provided by a translation look-aside buffer (TLB) and a cache, wherein the TLB comprises a TLB tag module storing address change data and a TLB data memory module storing Translation Look-aside data; the cache comprises a TAG memory module storing cache memory index data and a cache data memory module storing cache data; the TAG memory module is divided into a plurality of modules and the cache data memory module is divided into a plurality of modules; the divided TAG memory modules are disposed on both sides of a longitudinal arrangement direction of the TLB tag module and the TLB data memory module with the TLB tag module and TLB data memory module being sandwiched therebetween; and the divided cache data memory modules are grouped into two and disposed on both sides of said divided TAG memory modules.
A semiconductor integrated circuit set forth in claim 4 of the present invention is the one according to claim 3, wherein the input/output circuits of one of the two groups of caches disposed on both sides of the longitudinal arrangement direction of the TLB tag modules and the TLB data memory modules with said TLB tag modules (12-1, 12-2) and TLB data memory modules (14) being sandwiched therebetween are opposed to the input/output circuits of the TLB.
A semiconductor integrated circuit as set forth in claim 5 of the present invention includes cache memory comprising a plurality of data memory modules, wherein first and second power supply lines are provided in a layer under output signal lines connected to the output of the data memory module section, and the first and second power supply lines intersect said output signal lines at a right angle and are alternately and repeatedly provided.
A semiconductor integrated circuit set forth in claim 6 of the present invention is the one according to claim 5, wherein the first and second power supply lines are provided in a layer above the output signal lines, and the first and second power supply lines intersect said output signal line at a right angle and are alternately and repeatedly provided.
A semiconductor integrated circuit set forth in claim 7 of the present invention is the one according to claim 5 or 6, wherein a power-supply potential is provided to the first power supply line, a ground potential is provided to the second power supply line, a P-channel MOS transistor is provided in a layer under the first power supply line, a gate of the P-channel MOS transistor is connected to the second power supply line, a drain and a source of the P-channel MOS transistor are connected to the first power supply line, an N-channel MOS transistor is provided in a layer under said second power supply line, a gate of the N-channel MOS transistor is connected to the first power supply line, and a drain and a source of the N-channel MOS transistor are connected to the second power supply line.
A semiconductor integrated circuit set forth in claim 8 of the present invention is the one according to any of claims 5 to 7, wherein no other signal line in a layer above or below or adjacent to the output signal lines is provided in parallel to all or any part of the output signal lines.
A semiconductor integrated circuit set forth in claim 9 is the one according claim 8, wherein the first or second power supply line is provided in parallel to all or part of the output signal line between the output signal line and another signal line.
A semiconductor integrated circuit set forth in claim 10 is the one according to any of claims 5 to 9 comprising a plurality of outputs as the output of the data memory module, wherein the outputs comprise pairs of non-inverse and inverse signals, and each set of signal lines connected to each pair of the pairs of non-inverse and inverse outputs are provided between the first and second power supply lines.
A semiconductor integrated circuit set forth in claim 11 is the one according to any of claims 5 to 10, wherein the output from the data module is output to the output signal lines through a first sense amplifier.
A semiconductor integrated circuit set forth in claim 12 is the one according to claim 11, wherein a given signal is selected from the plurality of data module outputs and output through a second sense amplifier.