1. Field of the Invention
The present invention relates to storage or memory in a processor. More specifically, the present invention relates to a multiple-port storage array.
2. Description of the Related Art
A processor includes storage or memory to store program data and instructions. The memory storage includes cells for storing information and lines for accessing the cells according to defined address locations. Typically the information is arranged in words that contain a plurality of cells. The cells in a word are connected by word lines. The cells in a plurality of words that are located at corresponding positions in the words are connected by bit lines.
A particular address in the memory is accessed by applying address signals to decoding circuitry called an address port. The address port sends an address select signal to a word line at the selected location in the memory array. When the address select signal matches the address of a word memory, data is transferred from or to the individual memory cells at the specified address. Data of each cell is transferred on the associated bit line.
For arrays having more than one address port, called multi-port arrays, more than one address may be decoded and more than one data transfer made during a single read/write cycle. A multi-port memory array has several common bit lines for each memory cell in the array. A register file is one type of memory array.
A word line is associated with each address in a memory array or each register in a register file. A separate word line is used at each address to control each of the separate read bit lines and each of the separate write bit lines. Each of the separate word lines is connected to an address port. Since for every cell in an array the number of bit lines may be equal to the number of word lines or an integer multiplier of the number of word lines and the number of word lines for each address is equal to the number of ports in the array, the size of the multi-port memory array increases as a square of the number of ports to the array.
During operation of the storage, an address is applied to a port and decoded, forming an address signal that is sent via the word line associated with the port to the decoded address location. The address signal on the word line causes the contents of the memory cells at the selected address to be written if the address is applied to a write port or read if the address is applied to a read port. Data is transferred to or from the memory cell via write bit lines and read bit lines, respectively. Each of the read bit lines and write bit lines is associated with a separate word line (port). During a single read/write cycle, the processor performs a plurality of read operations up to the total number of read ports and a plurality of write operations up to the total number of write ports. In a single read/write cycle, the read addresses and write addresses may be different or the same.
Because more than one read operation may be made from a particular memory address during one read/write cycle, the maximum amount of current applied to the memory cell is determined by the number of read ports in the array.
Each memory cell is associated with a word line, a bit line, and pass transistors, resulting in a size or pitch of the memory array that is relatively large. The pitch size of the individual cells corresponds to a large overall size of the memory array and usage of a large percentage of the area on an integrated circuit die. The large area of the circuit results in a reduced manufacturing yield and increased fabrication cost of the circuit. The relatively large size of the memory array lengthens the average access time of data in the memory array in several aspects. First, a larger overall size in a memory array results in longer word lines and bit lines, lengthening the time for a signal to pass along the line. Second, the pass transistors, word line, and bit line associated with a cell increase the capacitive loading on the cell, reducing the capability of the finite charge stored in each cell to drive a selected differential bit line pair. These difficulties are magnified with an increase in scalarity of the integrated circuit.
The evolution of microprocessor and processor integrated circuits trends toward aspects of greater scalarity, reduced cycle times, larger register files, and larger word widths. Storages and memories, such as a register file, a static random access memory (SRAM) array with complex read/write circuitry, a relatively large size, and relatively slow access speeds, is a substantial barrier to performance improvements relating to each of these aspects. Many consider the SRAM memory array to be a design impediment that in the next generation of processors may discourage or even prevent further advancements in scalarity, an increase in word size, an increase in the size of the register file, and/or a reduction of cycle time. Accordingly, an improved register file structure and operating method is needed.
A multi-ported register file is typically metal limited to the area consumed by the circuit proportional with the square of the number of ports. It has been discovered that a processor having a register file structure divided into a plurality of separate and independent register files forms a layout structure with an improved layout efficiency. The read ports of the total register file structure are allocated among the separate and individual register files. Each of the separate and individual register files has write ports that correspond to the total number of write ports in the total register file structure. Writes are fully broadcast so that all of the separate and individual register files are coherent.
For example, a 16-port register file structure with twelve read ports and four write ports is split into four separate and individual 7-port register files, each with three read ports and four write ports. The area of a single 16-port register file would have a size proportional to 16 times 16 or 256. Each of the separate and individual register files has a size proportional to 7 times 7 or 49 for a total of 4 times 49 or 196. The capacity of a single 16-port register and the four 7-port registers is identical with the split register file structure advantageously having a significantly reduced area. The reduced area advantageously corresponds to an improvement in access time of a register file and thus speed performance due to a reduction in the length of word lines and bit lines connecting the array cells that reduces the time for a signal to pass on the lines. The improvement in speed performance is highly advantageous due to strict time budgets that are imposed by the specification of high-performance processors and also to attain a large capacity register file that is operational at high speed.
An another example, a 17-port register file structure includes twelve read ports and five write ports. Each of the separate and individual register files has 5 write ports. The area of a single 17-port register file would have a size proportional to 17 times 17 or 289. Each of the separate and individual register files has a size proportional to 8 times 8 or 64 for a total of 4 times 64 or 256.
In accordance with an embodiment of the present invention, a storage array structure for a processor having R read ports and W write ports includes a plurality of storage array storages. The storage array storages have a reduced number of read ports allocated from the R read ports so that the total number of read ports for the plurality of storage array storages is R. The storage array storages each have W write ports.
In accordance with an embodiment of the present invention, a register file structure for a processor having R read ports and W write ports includes a plurality of register file storages. The register file storages have a reduced number of read ports allocated from the R read ports so that the total number of read ports for the plurality of register file storages is R. The register file storages each have W write ports.
In accordance with another embodiment of the present invention, a processor includes an instruction supplying circuit and a plurality of functional units. The processor includes a register file structure coupled to the instruction supplying circuit and coupled to the plurality of functional units. The register file structure has R read ports and W write ports and includes a plurality of register file storages. The register file storages have a reduced number of read ports allocated from the R read ports so that the total number of read ports for the plurality of register file storages is R. The register file storages each have W write ports.