In a SIMD digital signal processing apparatus, multiple processing elements are connected in parallel on a single chip, enabling all the processing elements to perform the same operation in response to a single program. It is used in image signal processing, data processing, etc.
For example, FIG. 16 shows the main portion of a SVP (scan-line video processor) for real-time processing of NTSC signals. This SVP has a three-layer structure made of data input register (DIR) 200, SIMD digital signal processing unit 202, and data output register (DOR) 204.
For DIR 200, the image data d1-dn corresponding to one horizontal scan line (e.g., 40 bits.times.960 words) are input repeatedly. In SIMD digital signal processing unit 202, processing elements pe1-pen with a number n equal to the number of pixels on a horizontal scan line (say, 960) are arranged in parallel (connection). Corresponding to the successive common instructions I from an instruction generating portion (not shown), these processing elements pe1, pe2, . . . pen execute the prescribed image processing operation for the corresponding pixel data d1, d2, . . . dn during one horizontal scanning period. In this way, image data d1-dn of one scan line are processed once. DOR 204 collects the operation processing results from processing elements pe1-pen to form image data d1-dn' (such as 24 bits.times.960 words) for one horizontal scan line. The data transfer from DIR 200 to processing unit 202 and data transfer from processing unit 202 from DOR 204 are carried out during the horizontal blanking period.
In this case, data input, parallel processing, and data output for each horizontal scan line are executed by processing DIR 200, unit 202, and DOR 204 in a pipeline scheme.
Each processing element PEk of processing unit 202 comprises a pair of register files, a 1-bit ALU (arithmetic logic unit), several working registers, and L/R communication unit for exchanging data with multiple left- and right-side neighboring processing elements (for example, 2 on each side). The register file on one side is connected to DIR 200, and it holds the data before and during operation. The register file on the other side is connected to DOR 204, and it holds the data during the operation and the data of the final operation result.
FIG. 17 shows the timing of the processing operation within each processing element PEk in the conventional SVP.
(1) First, 1-bit data is read from one and/or the other memory address of a pair of register files with address assigned by the corresponding instruction Ii (DATA READ). PA1 (2) Then, the L/R communication unit performs conditional exchange of data with the left-side and right-side neighboring processing elements (e.g., two on each side) PEk-2, PEk-1, PEk+1, PEk+2, assigned by the corresponding instruction Ii (LRCOM). PA1 (3) Then, ALU executes the operation as assigned by the corresponding instruction Ii for the data read in the steps (1), (2) and/or the received data (ALU). PA1 (4) Finally, one set of the data obtained in the steps (1), (2), (3) is written into one and/or the other memory address of the pair of register files assigned by the corresponding instruction Ii (WRITE BACK).
In this way, during each clock cycle, four steps (1)-(4) are executed for performing 1-bit operation processing. For example, in the operation of 8-bit addition to obtain 9-bit output, 9 clock cycles are needed. For each of these clock cycles, the four steps (1)-(4) are executed.
As shown in FIG. 18, the register files in DIR 200, DOR 204, and processing element PEk are made of current-read-type DRAM (dynamic random access memory).
In the current-read-type DRAM of FIG. 18, memory node N of capacitor 206 that forms the memory cell is connected via write transistor 208 to write bit line WBL, and it is connected through memory cell transistor 210 and access transistor 212 to read bit line RBL. The gate terminal of write transistor 208 is connected to write word line WWL, and the gate terminal of access transistor 212 is connected to read word line RWL. Read bit line RBL is connected via precharge transistor 214 to the terminal of power source voltage VDD, and it is connected to input terminal 216a of single-ended sense amplifier 216 comprised of inverters.
In the write operation, write word line WWL is enabled, and write transistor 208 conducts, and 1-bit information of "1" (H-level) or "0" (L-level) is written from write bit line WBL to capacitor 206. When "1" (H-level) is stored in capacitor 206, NMOS type memory cell transistor 210 is on; when "0" (L-level) is stored, it is off.
FIG. 19 shows the waveforms of the various portions in the read operation and timing. First, as precharge control signal XPCHG becomes active (L-level), PMOS precharge transistor 214 conducts, and read bit line RBL is precharged to a voltage of the H-level near power source voltage VDD (e.g., 3 V). After end of the precharge, read word line RWL becomes active (H-level), and NMOS type active transistor 212 conducts.
When the information "1" is stored in capacitor 206, memory cell transistor 210 turns on. Then access transistor 212 conducts, and current flows from read bit line RBL through two transistors 212, 210. The voltage of RBL drops exponentially with time.
As the voltage of read bit line RBL falls below a prescribed threshold (such as 1.5 V), PMOS output transistor 218 of sense amplifier 216 turns on, and NMOS output transistor 222 turns off. In this case, read control signal READ is enabled (H-level), and NMOS read transistor 220 conducts. The "1" (H-level) read information (DATA) is read from output terminal 216b of sense amplifier 216.
When information "0" is stored in capacitor 206, memory cell transistor 210 turns off. Even when access transistor 212 conducts, the voltage of read bit line RBL is still maintained at the H-level. In sense amplifier 216, NMOS output transistor 222 is on, while PMOS output transistor 218 is maintained in the off state. In the prescribed timing, read control signal READ becomes active (H-level), and the NMOS type read transistor 220 conducts, so that "0" (L-level) read information (DATA) is obtained from output terminal 216b of sense amplifier 216.
As explained above, with the conventional SVP, in order to perform image signal processing with respect to 1-bit data, for each processing element PEk, it is necessary to perform four steps in each clock cycle, namely, (1) step in which data is read from the register file (DATA READ); (2) step in which data is conditionally exchanged with multiple left-side and right-side neighboring processing elements (LRCOM); (3) step in which ALU performs operation for the data obtained in the steps (1) and (2); and (4) step in which one set of the data obtained in the steps (1), (2), (3) is written in the register file (WRITE BACK). These steps are executed in order.
However, as shown in FIG. 17, the actual processing time is short in steps (1)-(4) in each cycle, while most of the time is not used for processing.
For step (1) (DATA READ), immediately after the beginning of a cycle, the read operation of the register file is started; reading of the data is completed in the first part of the cycle, while the remaining time a (the intermediate part and latter part) becomes the time merely for holding the data.
In step (2) (LRCOM), nearly the time b1 of the first part of each cycle is the time for waiting for the data sent from the other processing elements on the left- and right-side neighbors, and the remaining time after the data is received in the intermediate portion of the cycle (latter part time of the cycle) b2 becomes the time merely for holding the received data.
In step (3) (ALU), time c1 corresponding to the first part and the intermediate part of the cycle is the time for waiting for the data in steps (1) and (2), while the remaining time c2 after execution of the operation in the latter part of the cycle is the time merely for holding the data of the operation results.
In step (4) (WRITE BACK), time d from the start of the cycle to end of step (3) is the time merely for waiting for the data of the operation results.
In this way, in each cycle, data waiting time, data holding time, and other nonprocessing times are attached to the actual processing time of various steps (1)-(4). So it is difficult to increase the throughput, which is a disadvantage.
Also, in the case of reading of current-read-type DRAM cell, in the conventional scheme, as shown in FIG. 19, the voltage level of read word line RWL varies with precharge control signal XPCHG. That is, during the period of precharge, XPCHG is in the active state (L-level), and RWL is in the nonactive state (L-level). At the end of precharge, XPCHG becomes the nonactive state (H-level), and, at the same time, RWL becomes the active state (H-level). In this way, the voltage of read bit line RBL falls conditionally (when the memory information is "1").
However, a certain time td is needed to have read word line RWL rise to the H-level. The discharge start time of RBL is delayed for this rise time td, and timing of detection of the sense amplifier is delayed.
Also, when the memory information is "1," the voltage of read bit line RBL oscillates between the upper threshold and lower threshold of the voltage logic as L-level (VSS: ground voltage) .fwdarw.H-level (VDD: 3 V) .fwdarw.L-level (VSS: ground potential). Thus the discharge time is prolonged in any case.
In this way, in the reading method of the conventional current-read-type DRAM cell, it is difficult to shorten the time period from the start of the read operation to the time point when the voltage of the logic level corresponding to the memory information of the cell on the read bit line RBL, and a the read access rate is limited.
The first purpose of this invention is to solve the problems of the conventional methods by providing an SIMD digital signal processing method and apparatus that can increase the number of cycles of operation that can be executed in unit time and to increase the throughput.
The second purpose of this invention is to provide a read method for the current read type memory cell for shortening the discharge time needed for the bit line and to increase the read rate.