Block degeneracy and WAY degeneracy have been used as rescue processes for a permanent fault in a conventional cache memory. In the block degeneracy and WAY degeneracy, the number of faults in a cache memory is observed in units of WAYs of the cache memory, and when the number of fault occurring in a unit time exceeds a predetermined threshold value, the cut off of the block or the WAY is performed. The function to cut off the cache line and its WAY that were faulty when the threshold value was exceeded for the first time is called block degeneracy. In the block degeneracy, data corresponding to one line are cut off. When a fault also occurs in data other than that in the cache line subjected to the block degeneracy, the subject of the block degeneracy is switched to the cache line and WAY in which the last fault occurred. In other words, the objective of the rescue for a fault by the block degeneracy is the protection of data corresponding to one line in a cache memory, and when a fault is occurring in a plurality of cache lines, the number of fault in a cache memory keeps increasing after the occurrence of the block degeneracy, and when it reaches a predetermined value, the WAY having the fault is cut off. When the WAY degeneracy occurs, the processor can continue the operation, but there is a significant decline in the performance. Therefore, the extent of the performance decline is reduced by inducing, before the WAY degeneracy, the block degeneracy in which only one cache line is cut off.
However, when the cache memory has a small storage capacity, the performance decline due to the loss of one cache line cannot be disregarded, and the faulty chip needs to be replaced even with the occurrence of the block degeneracy. In the case of a CPU having a plurality of cores, it is impossible to replace only the core subjected to the block degeneracy due to a one-bit permanent fault, so the replacement is to be performed in units of chips including other cores that are operating normally. Since there has been a trend in the recent CPU development to increase the number of cores to be mounted on a chip while reducing the capacity of the first level cache memory, the conventional fault-protection mechanism for a cache memory by means of the block degeneracy increases the possibility that cores that have not been faulty become the target of replacement meaninglessly. As a result, the conventional rescue process by means of the block degeneracy had insufficient fault resistance in units of CPU chips.
In this regard, a method was invented, for rescuing a one-bit permanent fault without losing a cache line by disposing a cache line alternation register for registering, when a one-bit permanent fault happens, cache data of the faulty line to replace the data in the line having the one-bit permanent fault (Patent document 1).
To implement the cache line alternation register, in the system that has been adopted conventionally, the physical address is used for the comparison of the alternation-target addresses, and when they match, the data in the cache line alternation register are read out. Since the physical address has a large number of bits, the address processing takes time. In addition, in order to obtain the physical address to be the target of the comparison, there is a need for referring to TLB (Translation Look-aside Buffer) and a cache tag. For this reason, while it has been possible to use the cache line alternation register in reading out from the cache memory, in the writing-in into the cache memory, the flow of the reference comparison of the TLB and cache tag and the writing-in significantly deviates from the normal cache-control flow, so the use of the cache line alternation resister has been abandoned. In other words, every time the cache alternation register becomes the store target, for example, the cache line alternation register was invalidated and the operation was restarted from the reference to the main memory, causing some performance decline with the store operation.
In addition, the conventional cache line alternation register has been configured to be able to replace all the bits in one cache line. However, in practice, the chances that a one-bit error occurs at a plurality of places in one cache line are low, making the use of circuit resources being not very efficient.
Therefore, the present invention proposes a cache line alternation register in a new configuration that solves the problems with the conventional cache line alternation register.
Then, first, the configuration of a conventional cache without the cache line alternation register of the present invention is explained. Since the configuration of the present invention is realized by adding the function to partly replace the operation of the conventional cache that is not equipped with the cache line alternation register, the configuration of the conventional cache needs to be clarified.
FIG. 1 illustrates the configuration of a conventional CPU.
A CPU 101 illustrated in FIG. 1 has four cores CORE-0 (102-0), CORE-1 (102-1), CORE-2 (102-2, CORE-3 (102-3) (hereinafter, CORE is referred to as 102).
Each core CORE 102 has IUs (Instruction Unit) (104-0, 104-1, 104-2, 104-3: hereinafter, IU is referred to as 104), EUs (Execution Unit) (105-0, 105-1, 105-2, 105-3: hereinafter, EU is referred to as 105), and SUs (Storage Unit) (103-0, 103-1, 103-2, 103-3: hereinafter, SU is referred to as 103).
Further, the SUs (103) respectively have IF-LBSs (Instruction Fetch Local Buffer Storage) (106-0, 106-1, 106-2, 106-3: hereinafter, IF-LBS is referred to as 106) that are instruction caches, and OP-LBSs (Operand Local Buffer Storage) (107-0, 107-1, 107-2, 107-3: hereinafter, OP-LBS is referred to as 107) that are operand caches.
In addition, the CPU 101 has an SX (Second Cache) 108 that is a second level cache, which performs data communication with the respective cores CORE 102, and the SX 108 further performs data communication with a Memory 110 that is the main memory via a SYSTEM BUS 109.
Next, the configuration of the IF-LBS 106 and the OP-LBS 107 that are the first level cache installed in the CPU 101 is illustrated in FIG. 2.
The cache consists of 2 WAYs WAY0 (201) and a WAY1 (202). When an address signal is given to each of the WAYs, the data of each address are read out and output to the data signal line. Cache RAM output data of the WAY0 are output to a data line 205, and cache RAM output data of the WAY1 are output to a data line 206. In addition, when WAY information is given to a WAY selection circuit 203, either the data line 205 or the data line 206 is selected, and the data are output (207) to the IU 104 (or the EU 105).
Meanwhile, the subject of the present invention is the first level cache that constitutes the instruction cache IF-LBS 106 and the operand cache OP-LBS 107.
In the configuration as described above, the operations of the cache are explained in detail below using a flow diagram.
First, the flow of the cache reading-out operation is illustrated in FIG. 3.
In the reading out from the cache memory, an access is performed to the cache data unit, cache tag unit and TLB unit.
In the cache data unit, bits 14-5 are taken out and sent out from a virtual address (S301), data are taken out within the RAM in the cycle next to the one in which the reference was made (S302), and data of 64 bytes in total are taken out from all cache RAMs in the next cycle (S303). The data taken out consist of data corresponding to two 32-byte WAYs.
In the cache tag unit, bits 14-6 are taken out and sent out from a virtual address (S304), the tag address is taken out within the RAM in the cycle next to the one in which the reference was made (S305), and physical addresses corresponding to 2 WAYs are taken out in the next cycle (S306).
In the TLB unit, bits 63-13 are taken out and sent out from a virtual address, and information representing the access space such as the access-space number or a context ID of the access space and a segment-table starting point is sent out as information representing the access space in which the reference to the cache memory is performed (S307); the virtual address registered in the TLB and the information representing the access space are compared and the physical address corresponding to the registered information that matched is taken out in the cycle next to the one in which the reference was made (S308); and one physical address is read out in the next cycle, and the conversion from the virtual address to the physical address is completed (S309).
In S310, the physical addresses of 2 WAYs read out from the tag unit and the physical addresses read out from the TLB unit are compared, and when they match, it is determined that the fetch target data exists in the cache memory, and the data in the matched cache WAY are used. In S311, information indicating the matched WAY in the cache tag is sent to the cache data unit, and in S312, the data of one of the 2 WAYs read out from the cache data unit are selected, completing the reading out from the cache memory.
After this, in the instruction cache IF-LBS 106, the read out 32-byte data are sent to the instruction unit without change, and the instruction unit side receives it as an instruction corresponding to 8 instructions having 4-byte length. In the operand cache OP-LBS 107, for the read out data, the alignment of the read out data is performed in accordance with the data width (1 byte, 2 bytes, 4 bytes, 8 bytes) of the target of the reading out, and at the same time, sign extension is performed as need perform conversion to a data format in which the positive/negative sign part of the 1 byte-, 2 byte-, 4 byte-data is extended, and the data for which the alignment and sign extension are completed are sent out to a computing unit. The computing unit side writes the received data into a register to be the subject of the reception, and starts the computing using the received data.
In the read out control of a cache memory, at the time when the cache data, cache tag, TLB are read out after two cycles from the access start, presence/absence of a fault is checked by a fault detection circuit disposed respectively. When there is no fault, the process is completed following the process procedure illustrated in FIG. 3. When a fault is detected and if it is a fault in the TLB, the TLB is completely deleted and the registration process of the TLB is performed again, of which detail explanation is omitted here. The conventional processing method in the case in which the cache data or cache tag has a fault is disclosed in Patent document 2. The conventional processing method is described below.
FIG. 4 and FIG. 5 illustrate the flow of the process operation in the case in which the cache data or cache tag has a fault. FIG. 4 illustrates the fault processing in the case in which the data or tag in the operand cache OP-LBS 107 has a fault, and FIG. 5 illustrates the fault processing in the case in which the data or tag in the instruction cache IF-LBS 106 has a fault.
First, in FIG. 4, when a fault is detected in the cache data or cache tag, the operand cache OP-LBS 107 in the SU 103 temporality stops the data processing, and switches an ERROR-STATE flag in a cache control unit to ON (S401). The cache line address and cache WAY having an error are registered in an ERAR (Error Address Register) (S402). There are two types of ERAR: OP-ERAR for the operand cache and IF-ERAR for the instruction cache. Both hold the cache line address bits 14-5 and WAY information. During the ERROR-STATE, all the subsequent processes are suspended, and for the process that has been carried on at that time, for example, a process requesting to bring the cache line to the main memory or to another cache memory, waiting for the arrival of the cache line and the completion of the registration to the cache memory is performed in this case.
After this, a request for performing a fault rescue process is sent to the lower-level cache (SX108) as well as the notification of the occurrence of the fault in the operand cache OP-LBS 107, and, from the information in OP-ERAR, notification of the cache line address and the cache WAY of the cache memory having the fault is performed (S403).
The lower-level cache that received the fault rescue process request has a copy of a notification-source cache tag, and refers to it for the fault rescue process (S404). The copy of the cache tag stores, as information, a Valid bit indicating whether the cache line is valid or invalid, and a physical address corresponding to the cache line. Whether or not the cache line for which the fault processing has been requested is valid is determined (S405), and when the cache line for which the fault processing has been requested is in an invalid state (No), the invalidation due to the fault of the cache line for which the notification has been performed is instructed, and at the same time, whether it is a process for which cache memory (the operand cache or the instruction cache) is clearly presented, and the information of the faulty cache line and the faulty WAY is sent back to the fault notification source without change (S406). In the cache having the fault, upon receiving the instruction for the invalidation due to the fault, a processing flow is performed in the operand cache pipeline for the cache line and the cache way for which the invalidation notification of the operand cache line has been received for rewriting a valid bit of the operand cache tag into an invalid state (S407). In a flow called B cycle (Buffer Read Cycle) in the operand cache pipeline for which the rewriting process is performed, the error processing is completed and the performance of the subsequent instruction is resumed by setting the ERROR-STATE flag to OFF, and at the same time, the notification of the completion of the invalidation of the cache tag is sent to the lower-level cache, as well as the notification of the invalidated cache line and cache WAY (S408). Upon receiving the notification, in the lower-level cache, the corresponding line in the cache tag copy is rewritten into an invalid state (S409).
If the fault-processing target cache line is in a valid state as a result of the search of the cache tag copy (in the case of Yes in S405), an instruction for the discharge of the cache line due to the fault is issued to the cache having the fault, while clearly presenting that it is a process for the operand cache, and the information of the faulty cache line and faulty WAY is sent back to the fault notification source without change (S410). When the cache having the fault receives the instruction for the discharge of the cache line due to the fault, a process flow is performed, for the cache line and cache WAY for which the invalidation notification of the operand cache line has been received, for reading out operand cache data while referring to the operand cache tag, as well as a process flow for reading out operand cache data while rewriting a valid bid in the operand cache tag into an invalid state (S411). In a flow called B cycle in the operand pipeline for which the process for rewriting the cache tag is performed, the error processing is completed and the performance of the subsequent instruction is resumed by setting the ERROR-STATE flag to OFF, and at the same time, the notification of the completion of the invalidation of the cache tag is sent to the lower-level cache, as well as the notification of the invalidated cache line and cache WAY (S414). For the operand cache OP-LBS 107, data transfer may occur at this time. The reading out of the operand cache data is performed twice, with 32 bytes read out with each reading out, so in the flow described above, 64-byte cache data corresponding to one line of the cache memory are read out and stored in a data buffer used in the case of processing involving data transfer, upon processing a request instructed from a lower level called MOB (Move-out buffer). As the reference to the cache tag, a read-out reference and write-in reference are performed. When the cache tag is searched in the read-out reference in S412, S413, whether the cache line is a change type or another is checked. In the case of the change type, the cache line is transferred to the lower level (S415, S416), and in the case of an invalid type or a shared type, the cache line is not transferred to the lower level (S408, S409). As a process procedure without the cache line transfer (S408, S409), the same processing as the fault rescue process at the instruction cache side is performed, completing the invalidation of the cache line. When transferring the cache line (S415, S416), a notification that the fault processing of the cache is to be completed with data transfer is sent, as well as the notification of the cache line and cache WAY that have been the processing target (S415). Upon receiving this notification, in the lower-level cache, the corresponding line in the cache tag copy is rewritten into an invalid state, and the received cache data is written into the data unit of the cache memory of its own cache level (S416).
Meanwhile, FIG. 5 illustrates the fault processing in the case in which the data or tag in the instruction cache IF-LBS 106 has a fault, and in FIG. 5, when a fault is detected in the cache data or cache tag in the IF-LBS 106, the processes in S501-S503 are performed. These processes are similar to the processes S401-S403 in FIG. 4. After that, regardless of the reference result of the cache tag copy, the invalidation due to the fault is instructed with respect to the cache line for which the notification has been performed, while clearly presenting that it is a process for the instruction cache memory, and the information of the faulty cache line and faulty WAY is sent back to the fault-notification source without change (S504). In the cache having the fault, upon receiving the instruction for the invalidation due to the fault, a processing flow is performed in the instruction cache pipeline for the cache line and the cache way for which the invalidation notification of the instruction cache line has been received for rewriting a valid bit of the instruction cache tag into an invalid state (S505). In a flow called B cycle in the instruction cache pipeline for which the rewriting process is performed, the error processing is completed and the performance of the subsequent instruction is resumed by setting the ERROR-STATE flag to OFF, and at the same time, the notification of the completion of the invalidation of the cache tag is sent to the lower-level cache, as well as the notification of the invalidated cache line and cache WAY (S506). Upon receiving the notification, in the lower-level cache, the corresponding line in the cache tag copy is rewritten into an invalid state (S507).
Meanwhile, FIG. 6 is a diagram illustrating the flow of a store operation in a conventional cache.
In the writing-in into a cache memory in response to a store instruction, the reference is performed once to the cache tag unit and the TLB unit and twice to the cache data unit, performing the processing flow twice for the operand cache pipeline.
In the first process flow of the store instruction, an access is performed to the cache data unit, cache tag unit and TLB unit. In the cache data unit, bits 14-5 are taken out and sent out from a virtual address, data are taken out within the RAM in the cycle next to the one in which the reference was made, and data of 64 bytes in total are taken out from all cache RAMs in the next cycle. The data taken out consist of data corresponding to two 32-byte WAYs. In the cache tag unit, bits 14-6 are taken out and sent out from a virtual address, the tag address is taken out within the RAM in the cycle next to the one in which the reference was made, and physical addresses corresponding to 2 WAYs are taken out in the next cycle. In the TLB unit, bits 63-13 are taken out and sent out from a virtual address, and information representing the access space such as the access-space number or a context ID of the access space and a segment-table starting point is sent out as information representing the access space in which the reference to the cache memory is performed, and a reference is made (S601).
The virtual address registered in the TLB and the information representing the access space are compared and taking out of the physical address corresponding to the registered information that matched is performed in the cycle next to the one in which the reference was made, and one physical address is read out in the next cycle, and the conversion from the virtual address to the physical address is completed. The physical address corresponding to the 2 WAYs read out from the tag unit and the physical address read out from the TLB unit are compared, and when they match, it is determined that the store target data exists in the cache memory (S602).
Meanwhile, when a chenge-type bit is read out from the cache tag and it is indicated as the change type, it is determined that the cache memory is not shared and the store can be performed. In addition, the matched cache information is recorded in a store instruction processing unit to be used for a subsequent writing in into the cache memory. At the same time, information indicating the matched WAY in the cache tag is sent to the cache data unit, and data of one of the 2 WAYs read out from the cache data unit are selected. The selected data are stored in a data holding unit or in a partial ECC holding unit other than the store target in an 8-bit data border, in order to rewrite ECC (Error Correction Code) information into an ECC corresponding to the updated data when performing the store with respect to the cache memory (S603). Meanwhile, details of the processing of ECC for non-store-target are described in Patent document 3 and Patent document 4.
Then, after the process in S603, independently from the processing flow at the cache side, the store data that is the store target of the store instruction are received by the operand cache from the computing unit, and stored in a store data register (S604).
Then, while completing the process flow at the cache side, if the transfer of the store data to the cache has been completed, the instruction unit determines whether or not the store instruction can be executed (S605, S606). In other words, whether any branching of an instruction processing line has not occurred for instructions preceding the store instruction and there is no need for shifting to another process such as a trap process is checked. This is realized by confirming the completion of all processing of instructions preceding the store instruction in a commit stack entry in the instruction unit. Then, the instruction unit instructs, with respect to the instruction that has become executable, the execution of the storing by turning a commit signal ON (S607).
At the cache side, in order to write the data of the store instruction for which the commit signal has been received into the cache memory, the second flow of the store process is performed (S608). In the second flow of the store process, an access is made to the cache data unit only. First, in the P cycle (Priority cycle: the request to be processed in the respective requests is determined in accordance with a fixed order of priority), in the cache data unit, bits 14-5 are taken out and sent out from a virtual address, and notification of the store-target WAY and store-target byte position is performed (S609). According to the information, in the cache data unit, the cache RAM to be the store target and the store-target byte position in the RAM are checked, and a WE (Write Enable) signal is turned ON for the target byte position in the target RAM. In parallel with it, the store data stored in the store data register are taken out, and sent out to the cache data unit (S610).
In the next T cycle (TLB/TAG cycle: a cycle to access TAG/LBS/TLB), the store data are stored in the store-target byte position in the store-target cache RAM (S611).
Then, in the R cycle (Result cycle: cycle to complete the pipeline process) after three cycles of the T cycle, an ECC is generated from the store data and data other than the store-target data, and the processing of the store instruction is completed by writing, into an ECC-ARRAY-RAM, the ECC information corresponding to the data after the storing (S612).
While the write operation into the cache in response to a store instruction has been described as above, in the case in which a fault is detected in the first process flow accompanying the store process, the fault is solved by performing the same process as in the case in which a fault is detected in the reading out operation from the cache illustrated in FIG. 3.
Next, with reference to FIG. 7, the operation after the invalidation of a cache line having a fault in a conventional cache to resume the processing of an instruction using the faulty cache line is explained.
When the invalidation of the faulty cache line is completed and the execution of an instruction is resumed, since the cache line used by the resumed instruction has been invalidated, the instruction processing is to be resumed from a cache miss (S701). For the resumed instruction, a cache miss is detected as a result of the search in cache tags (S702); a move-in request is sent out to a lower-level cache to take the cache line with the cache miss into the cache itself; and the physical address in the move-in request, and the cache line address and cache WAY information in the cache memory to register the move-in (MI: Move-In) cache line are held in an MIB (Move In Buffer) (S703). The cache WAY to be the move-in target is subject to, if there is no invalid WAY, the LRU (Least Recently Used), and if there is an invalid WAY, it is selected. Since the cache line is invalidated at the time of the resumption of the instruction execution after an error processing, an invalid cache WAY is selected. At this time, the WAY that was invalidated in the preceding error processing is selected.
In the lower-level cache that received the move-in request, a search is performed in the cache tags in the cache level (S704). When there is a cache hit, cache data taken out from the cache data unit of the cache level is transferred to the request-source cache (S705). In parallel with it, the cache tag copy at the request source is rewritten into the move-in request address, and an instruction for a replace block is sent out to the request-source cache (S706). In the case of the instruction resumption after an error processing, the replace block, that is, the cache WAY that was registered in the cache line being the target of the move-in request before the move-in request has been invalidated, so an instruction that there is no need for the replace process is issued to the request-source cache.
In the move-in request-source cache, upon receiving the cache line, the resumed operation is continued using the received cache line, and the received cache line is written into the cache memory. Since 32 bytes are written in with one writing, for the registration process of one 64-byte cache line, the process flow is performed twice.
First, in the first process flow, the cache line and cache WAY information, and the 32 bytes of the received cache line stored in the MIB are sent to the cache data unit, and written into the RAM of the cache data unit (S707). Next, in the second flow, the remaining 32 bytes are written into the RAM of the cache data unit, and at the same time, the physical address held in the MIB is written into the cache tag unit (S708), and the Valid bit of the cache tag is turned ON (S709). For instructions other than the instruction for which move-in is requested due to the occurrence of a cache miss, data are taken out from the MIB until the registration to the cache memory, and after the registration, an access is performed to the cache tag unit and the cache data unit, to take out the contents of the cache memory. If a fault is detected again at this time, the series of processing flows of the fault processing are performed again, to solve the fault.
If the fault is not solved after repeating the fault-solving process flow for a predetermined time, the faulty cache line is cut off by a means called block delete. Details of the block delete process are described in Patent document 5. However, when the block delete process is carried out, performance decline due to the separation of one cache line is inevitable. Therefore, a method to rescue a one-bit permanent fault of a cache memory without losing a cache line has been sought for.
A conventional cache without a cache line alternation register has been explained as above with reference to FIG. 1-FIG. 7.
To summarize the description above, in the implementation system of a conventional cache line alternation register, the physical address is used for the comparison of the alternation-target addresses, and when they match, data in the alternation register are read out, but there has been a problem that since the physical address has a large number of comparison-target bits, the address processing takes time. In addition, in order to obtain the physical address to be the comparison target, a reference needs to be made to the TLB and cache tag, so the cache line alternation register can be used for the reading out from the cache, but in the writing into the cache, since the flow of the reference comparison of the TLB and cache tag and writing-in significantly deviates from the normal cache control flow, the use of the cache line alternation register needed to be abandoned, and every time the cache line alternation register becomes the store target, for example, the cache line alternation register was invalidated and the operation was restarted from the reference to the main memory, causing some performance decline with the store operation.
Meanwhile, conventionally, the alternation register has been configured to be able to switch all the bits in one cache line. However, in practice, the chances that a one-bit error occurs at a plurality of places in one cache line are low, and the realization of the function of the alternation register with a smaller number of registers has been sought for, to save circuit resources.
Therefore, a new cache line alternation register is installed in a conventional cache, to solve the problem.    Patent document 1: Japanese Laid-open Patent Application No. 52-15236    Patent document 2: Japanese Patent No. 3483296    Patent document 3: Japanese Patent Application No. 2006-999902 (WO 2007/094045)    Patent document 4: Japanese Patent Application No. 2006-353505 (Japanese Laid-open Patent Application No. 2008-165449)    Patent document 5: Japanese Patent Application No. 2006-999821 (WO 2007/097026 A1)