1. Field of the Invention
The present invention relates to a technique for accessing to cache memory comprising, for each entry, a data unit for storing data and a tag unit for storing a tag address which is an index of the data.
2. Description of the Related Art
Cache memory capable of a higher speed as compared to main memory is currently a compulsory requirement for higher speed data processing. A cache memory apparatuses accessing cache memory according to a designated address include one equipped within a central processing unit (CPU) and one furnished on the outside thereof.
FIG. 1A is a diagram showing a configuration of a conventional cache memory apparatus, and FIG. 1B is a diagram showing a layout of data within main memory to be stored in the cache memory. FIG. 1B exemplifies a layout of each piece of pixel data in the case of storing image data of one byte for one pixel in the main memory. The image data is one divided into 1024 pixels in the horizontal direction and 1024 lines in the vertical direction, with the pixel data being laid out in the position corresponding to a coordinate.
As shown in FIG. 1A, the cache memory constituting a conventional cache memory apparatus is constituted by a tag part 12 for storing a tag address, and by a data part 13 storing data, both for each entry. An address 11 specified by a CPU, et cetera, has a structure of laying out fields 11a through 11c storing, from the uppermost bit side, a tag address, an index address and a line address, respectively. The tag address is used for indexing data which is stored or to be stored. The index address is for specifying an entry (number). The line address is for specifying data which is stored or to be stored in an entry specified by the index address. In this specification, the assumption is that the data of one address is worth of one byte, and one entry (i.e., block) is capable of storing sixteen bytes of data (i.e., data in the amount of sixteen pixels). Also assumed is the number of entries being 256. Assumption for the address 11 is, from the uppermost bit, 20 bits are allocated to the tag address, 8 bits to the index address and 4 bits to the line address. It is called a “cache address” hereinafter for avoiding confusion.
The next is a description of an operation.
In the case of a read request for reading data from the data part 13 being made, a tag address of an entry specified by the index address is read from the tag part 12, the tag address is compared with a tag address included in the cache address 11 by an address comparator 14 and the comparison result is output as hit information. The hit information constitutes one indicating that a target piece of data exists, that is, being hit, if the tag addresses are identical, while it constitutes the one indicating that the target piece of data does not exist, that is, being miss-hit, if these tag addresses are not identical. Accordingly, the data stored in the entry specified by the index address is read from the data part 13 and processed in the case of hit information indicating the fact of a hit being output from the address comparator 14.
Comparably, in the case of a write request for storing data in the data part 13 being made, a tag address of the cache address 11 is stored in an entry specified by the index address of the tag part 12, and the data is stored in the data part 13 according to a line address of the cache address 11.
In the cache address 11, an index address stored in a field 11b is for indicating a position on a line. One entry is configured to allow storage of the amount of sixteen pixels. By this, one line is data in the amount of 64 (i.e., 1024/16) blocks (i.e., entries) as shown in FIG. 1B. In FIG. 1B, each frame in which “0”, “1” or “255” is written, within a range of four lines, indicates data in the amount of one entry. Therefore, if 1024 pixels are lined up in the horizontal direction (i.e., one line), the indication is that 256 entries are required to store data in the amount of four lines.
The conventional cache memory apparatus shown in FIG. 1A is the one adopting a direct map system. In the direct map system, there is only one entry (i.e., space) capable of storing data corresponding to a tag address. The only one piece of data among data of blocks having the same index address and different tag addresses can be stored. An index address stored in the field 11 is for indicating a position on a line. Therefore, even in the case of processing data of a 16×16 block (i.e., 16 pixels horizontal and 16 lines vertical) for example, as shown in FIG. 2, the data in the amount of only one line can be stored in the cache memory. Because of this, a hit ratio is extremely low. That is, a replacement for rewriting data occurs frequently due to a misfit, hence degrading a processing performance greatly. An “index i” shown in FIG. 2 indicates an index address corresponding to data within a 16×16 block as a target of processing.
Most of image processing carries out a process in the unit of rectangular block such as 16×16. The conventional cache memory apparatus shown in FIG. 1A, however, is not capable of storing data of a plurality of lines which is lined up in the vertical direction. Accordingly, the conventional cache memory apparatuses include ones comprising a capability of storing data of a plurality of lines lined up in the vertical direction as respectively noted in a Laid-Open Japanese Patent Application Publication No. 09-53909 (noted as “patent document 1” hereinafter), No. 09-101915 (noted as “patent document 2” hereinafter), and No. 10-154230 (noted as “patent document 3” hereinafter).
The cache address 11 shown in FIG. 1A has a structure of laying out fields 11a through 11c storing, from the uppermost bit side, a tag address, an index address and a line address as described above. The conventional cache memory apparatus noted in the patent document 1 adopts a structure, for a cache address 11, of laying out fields respectively storing, from the uppermost bit side, a tag address, a first index address, a first line address, a second index address and a second line address. By this, an entry is specified by the first and second index addresses and an address within the entry is specified by the first and second line addresses. By so doing, it is possible to store, in one entry, data in the amount of the number of pixels according to the number of bits allocated to a field for storing the second line address in the horizontal direction and data in the amount of a block of the number of lines according to the number of bits allocated to a field for storing the first line address in the vertical direction. Therefore, it is possible to store data in the amount of a 4×4 block if the former and latter are 2-bit, for example.
In image processing, a reference to an adjacent block is also performed frequently. Data of a block adjacent to the present block in the vertical direction can be stored in another entry by changing the tag address and first index address. Therefore, an access to pixel data lined up in the vertical direction can be carried out without a replacement. In a block adjacent in the horizontal direction, however, the data cannot be stored in another entry because the tag address is the same. That is, the data of a block adjacent in the horizontal direction must be stored in cache memory by a replacement. Because of this, an improvement of a hit ratio cannot be conventionally expected in the case of carrying out image processing.
The conventional cache memory apparatus noted in the patent document 2 adopts a structure, for a cache address 11, laying out fields respectively storing, from the uppermost bit side, a first tag address, an index address, a second tag address and a line address. It makes a low bit indicating a position of a pixel on a line as a line address, and an upper bit than the aforementioned as a second tag address. By this, data of a block (i.e., a block made up of a plurality of pixels which is lined up on one line herein) adjacent in the vertical direction can be stored in a different entry. However, because of making a low bit indicating a position of a pixel on a line as a line address, and an upper bit than the aforementioned as a second tag address, a value of the index address on the same line becomes the same as shown in FIG. 3. Therefore, an improvement of a hit ratio cannot be conventionally expected in the case of carrying out image processing.
The above noted patent document 2 additionally notes another conventional cache memory apparatus adopting a structure, for a cache address 11, of laying out fields respectively storing, from the uppermost bit side, a first tag address, a first index address, a second tag address, a first line address, a second index address, a third tag address and a second line address.
Data of an adjacent block both in the horizontal and vertical directions can be stored in a different entry by dividing the cache address 11 and laying out two tag addresses in the form of correlating with two line addresses respectively, thereby making it possible to greatly improve a hit ratio.
Both of the former and latter assume image data to be an image divided into 1024 pixels in the horizontal direction and 1024 lines in the vertical direction. One pixel data is one byte. The number of entries is 256. Therefore, ten bits are required to indicate a position of a pixel in the horizontal direction, and so are ten bits for indicating a position of a line in the vertical direction. The latter (another conventional cache memory apparatus noted in the patent document 2) adds four bits to the required ten bits both for the horizontal and vertical directions and uses the four bits as fields for storing the third and second index addresses.
The number of bits required for the cache address 11 is increased by adding such two index addresses (i.e., fields). Due tot his, there is a possibility of the number of bits exceeding a data bus or a bit width of a register. In the former and the conventional cache memory apparatus noted in the patent document 1 which are required for the address 11, the configuration is such as to automatically determine an entry for storing data from a storage position of data in the main memory. In the case of adding two index addresses, they must be determined so as to store them in an entry for storing data. Because such a determination must be carried out, an access control becomes that much more complex. Therefore, the addition of two index addresses is considered to be not preferable.
A conventional cache memory apparatus noted in the patent document 3 adopts a structure, for a cache address 11, laying out fields for respectively storing a coordinate y indicating a position of a line in the vertical direction and a coordinate x indicating a position of a pixel in the horizontal direction. It divides each field into two subfields and makes one which lays out, from the upper most bit side, a lower bit of the coordinate y and x as an index address respectively. It makes one which lays out, from the upper most bit side, an upper bit of the coordinate y and x as a tag address respectively. Data of an adjacent block can be stored in a different entry for both of the horizontal and vertical directions by dividing a bit string indicating the coordinates x and y into two parts, and using one part as a tag address and the other part as an index address. Because of this, a hit ratio can be greatly improved. A range (i.e., a form) of image data whose data is stored in the cache memory is changed by changing the way of dividing a bit string indicating the coordinates x and y as shown in FIG. 4. The frames marked by “0” and “1” shown in FIG. 4 indicate data in the amount of one entry, respectively.
The conventional cache memory apparatus noted in the patent document 3 is configured to carry out a storing and reading data in and from the cache memory by the unit of entry. In order to carry it out in the unit of entry, data to be stored in one entry is made as the data of a block of a predetermined fixed form (i.e., data of a plurality of pixels lined up in the horizontal direction.
In order to improve a process performance, the important is also a decreased number of accesses to the cache memory. In the case of storing data of a fixed form block in each entry, there is a possibility of an increased number of accesses depending on a relationship between the aforementioned form and data as a target of processing. For example, if a block of a fixed form is sixteen pixels lined up in the horizontal direction, four times of readouts are required for reading all of pixel data of a 4×4 block. Therefore, the form of a block is preferably enabled to be changed on an as required basis. A provision of the capability of a change conceivably enables an effective utilization of cache memory.