1. Field of the Invention
The present invention relates to a data processor and more particularly to an improvement that allows a shortening of access time to misaligned data (i.e., data crossing a word boundary) in data memory.
2. Description of the Background Art
As data processors dedicated to high-speed digital signal processing, digital signal processors (DSPs) using architectures suitable for data processing have a wide reputation. The DSPs perform data processing such as a product-sum operation at high speed. One example of the DSPs is the Motorola""s DSP56000 (cf. xe2x80x9cDSP56000/DSP56001 Digital Signal Processor User""s Manualxe2x80x9d, 1990).
The DSP56000, which comprises two address pointers, two data memory, and a product-sum operation unit, concurrently performs loading of 2-word data from the two data memory specified by the two address pointers (e.g., loading of data and coefficient), updating of the two address pointers, and product-sum operations, thereby improving its throughput in performing the product-sum operations.
With recent growing needs for ever higher performance in applications, a large number of high-performance DSPs have been developed which have VLIW or SIMD architecture to achieve a high degree of parallelism in an arithmetic or logic operation. In order to enhance its throughput in performing a plurality of product-sum operations, such DSPs are configured to increase bandwidth between the data memory and the data path so that a plurality of operand data necessary for an arithmetic or logic operation can be loaded in a single cycle.
FIG. 65 is a block diagram of one of such DSPs, which is considered as a background art of the present invention. This processor comprises two 64-bit-wide data memories 80, 81 and a data path unit 86 which is configured to be able to perform four 16- by 16-bit product-sum operations in parallel on two 64-bit data DB read from both the data memories 80, 81.
Parallel processing of the product-sum operations is accomplished by SIMD (Simple Instruction Multiple Data stream) architecture, in which four 16- by 16-bit product-sum operations are performed in accordance with a single product-sum arithmetical instruction holding two 64-bit operand data. A control unit 83 reads out an instruction ID from an instruction memory 82 using an instruction address IA specified and issues a control signal CS to each component of the processor so that the respective components operate in accordance with the instruction ID.
The data memories 80, 81 are configured to store four words at each line (one word is 16 bits long), from each of which four words of data DB on the same line which is specified by an operand address OA from an operand-address generation unit 84, 85 can be read out in a single cycle. In each memory space of the data memories 80, 81, the boundary between 4-word data aligned in a line is called a word boundary.
The hardware of the conventional data processors generally does not support access to data which is not aligned on but crossing the word boundary (the data is hereinafter referred to as xe2x80x9cmisaligned dataxe2x80x9d; inversely, data aligned on, i.e., not crossing, the word boundary is referred to as xe2x80x9cword-aligned dataxe2x80x9d). Even if supported, access to misaligned data requires the execution of two or more instructions and thus cannot be accomplished with a throughput of a single cycle. The processor in FIG. 65 corresponds to the latter case.
Consider a case where the above DSP is used as a FIR filter (Finite Impulse Response filter, which is a kind of digital filter) by exploiting its advantage of being able to perform four product-sum operations in parallel. Since FIR filters require misaligned data, product-sum operations cannot be accomplished with a throughput of a single cycle. Thus, it is difficult to speed up FIR processing.
To form an FIR filter, for example, the data memories 80, 81 store strings of data X and strings of coefficients C, respectively, as shown in FIG. 66 and product-sum operations as shown in FIG. 67 are performed by reading out such data X and coefficients C. The data X is input data to the FIR filter and the data Y is output data therefrom. Parallel processing of the four 16- by 16-bit product-sum operations in FIG. 67 results in high operation speed.
The execution of the operations in FIG. 67 requires the reading of misaligned data X, such as 4-word data from X1 to X4, from the data memory 80. When such misaligned data is operand data, the processor in FIG. 65 has to load read data from the data memory 80 alternately into two 64-bit registers, and then to fetch four out of the eight 16-bit data stored in the two registers, placing them in another register for sorting. Therefore, two or more cycles are necessary for the performance of the product-sum operations.
Even such data processors as using MIPS architecture which supports access to misaligned data still require two or more cycles for loading of misaligned data. Accordingly, product-sum operations can be performed only once in two cycles, which increases processing time.
A first aspect of the present invention is directed to a data processor receiving data from a memory being capable of storing N (xe2x89xa72) words of data at each address and processing the data. The data processor comprises: M (xe2x89xa71) registers each being capable of holding one of the addresses and N words of data; a selector for selecting and outputting N consecutive words of data specified on a word-by-word basis from among data held in the M registers and data read from the memory; and a controller for, when the N consecutive words of data have a portion which is not held in any of the M registers, reading out N words of data containing the portion from the memory and, when the M registers include a no-data-holding register which does not hold any of the N consecutive words of data, updating values of the no-data-holding register with N words of data read from the memory and its address.
According to a second aspect of the present invention, in the data processor of the first aspect, the M registers include two or more registers.
According to a third aspect of the present invention, in the data processor of the first or second aspect, the controller writes N words of data into the memory at a specified address in response to a write instruction and disables all of the M registers so that the M registers are equivalent to those which do not hold any address and data in the memory.
According to a fourth aspect of the present invention, in the data processor of the first or second aspect, the controller writes N words of data into the memory at a specified address in response to a write instruction and, when the M registers include a register holding the specified address, updates N words of data held in that register with the N words of data written into the memory.
According to a fifth aspect of the present invention, in the data processor of either of the first to fourth aspects, the controller reads out N words of data which is stored at a specified address in the memory, in response to an aligned-data read instruction; and the selector outputs the N words of data read from the memory in response to the aligned-data read instruction.
According to a sixth aspect of the present invention, in the data processor of either of the first to fifth aspects, the controller reads out N words of data containing a specified word from the memory, in response to a single-word parallel read instruction; and the selector outputs N words in parallel, each being the specified word included in the N words of data read from the memory, in response to the single-word parallel read instruction.
According to a seventh aspect of the present invention, in the data processor of either of the first to sixth aspects, the controller includes another register and, when updating a value of either of the M registers, computes an address contiguous to the one to be held in the updated register and loads the computed address into the another register.
According to an eighth aspect of the present invention, the data processor of the first aspect further receives data from another memory capable of storing N (xe2x89xa72) words of data at each address. The data processor further comprises: other M (xe2x89xa71) registers each being capable of holding one of the addresses of the another memory and N words of data; another selector for selecting and outputting other N consecutive words of data specified on a word-by-word basis from among data held in the other M registers and data read from the another memory; another controller for, when the other N consecutive words of data have another portion which is not held in any of the other M registers, reading out N words of data containing the another portion from the another data memory and, when the other M registers include a no-data-holding register which does not hold any of the other N consecutive words of data, updating values of the no-data-holding register with the N words of data read from the another memory and its address; and an operation unit for performing an arithmetic or logic operation using both data output from the selector and the another selector.
A ninth aspect of the present invention is directed to a data processor receiving data from a memory being capable of storing N (xe2x89xa72) words of data at each address and processing the data. The data processor comprises: a controller for reading out N words of data which is stored at an address containing a specified word from the memory; and a selector for outputting N words in parallel, each word being the specified word included in the N words of data read from the memory.
In the data processor of the first aspect, when the N consecutive words of data specified have a portion which is not held in any of the M registers, N words of data at an address containing this portion are read out from the memory. At this time, if the M registers include a no-data-holding register, the value of the no-data-holding register is updated with the N words of data read from the memory and its address.
The N consecutive words of data to be specified may be either word-aligned or misaligned data in the memory, and the word addressing may proceed in either a direction of increasing the word address (postincrement) or a direction of decreasing the word address (postdecrement). In any case, when the width of update (increment or decrement size) between specified words is within predetermined limits depending on the number of registers (=M), data stored at one address containing the N consecutive words of data (for word-aligned data) or data stored at least one of two addresses containing the N consecutive words of data (for misaligned data) is held in any of the M registers, except in the case of initial word addressing. Accordingly, only a single read operation from the memory should be enough for each word addressing except for initial word addressing.
In some cases, the read data may be held in any of the M registers; but the selector, which can directly select the read data, does not have to select data from such a register after the read data is held. Thus, one clock cycle should be enough for the selector to output N consecutive words for each word addressing except for initial word addressing. The technique disclosed in Japanese Patent Application Laid-open No. 10-161927 (1998) is intended only for access to word-aligned data; therefore, even though bringing efficiency to word-aligned data access, it fails to achieve the aforementioned effect of the present invention, i.e., improvement in data access including misaligned data.
The data processor of the second aspect, which comprises a plurality of registers, is widely adaptable to various widths of update within plus or minus four words. In addition, when the width of update is within plus or minus one word, only a single read operation from the memory should be enough for every N addressing. This reduces power consumption in the memory.
The data processor of the third aspect disables all the registers when data is written into the memory. This maintains coherency between the values of the memory and the registers.
The data processor of the fourth aspect, when writing data into the memory, updates data in the register with the write data, thereby maintaining coherency between the values of the memory and the register. Besides, since no register is disabled, only one access to the memory should be enough for the first access immediately after the restart of the load operation.
The data processor of the fifth aspect can selectively perform the load operation and the aligned-data load operation. Since the values of the registers are not updated during the aligned-data load operation, only one access to the memory should be enough for the first access immediately after the restart of the load operation which was interrupted by the aligned-data load operation.
The data processor of the sixth aspect can perform the single-word parallel load operation. When used as a FIR filter, this processor can reduce the memory capacity for multiplier coefficients by 1/N.
The data processor of the seventh aspect, when updating any of the values of the M registers, previously computes an address contiguous to the one to be held in the updated register and stores it into another register. The address held in another register will be used in the next read operation from the memory. This eliminates the need for calculating a new address, thus shortening processing time for reading.
The data processor of the eight aspect can perform an arithmetic or logic operation using several kinds of numeric values and is thus suitable as a FIR filter.
The data processor of the ninth aspect can perform the single-word parallel load operation. When used as a FIR filter, this processor can reduce the memory capacity for multiplier coefficients by 1/N.
An object of the present invention is to provide a data processor capable of reading misaligned data as operand data in a single cycle, thereby speeding up data processing.
A technique related to that of the present invention is disclosed for example in Japanese Patent Application Laid-open No. 10-161927 (1998).
These and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.