1. Field of the Invention
The present invention relates to signal processing apparatuses, and more particularly to an improvement of DSP-LSI (Digital Signal Processor - Large Scale Integrated circuit), for example, incorporated in electronic equipment such as television receivers, video tape recorders, set top boxes, multimedia computers, broadcasting equipment, and so on.
2. Description of the Related Art
Conventionally, a configuration of a processor for realizing programmable digital signal processing for video signals such as television signals and so on is implemented by a linear array type processor employing an SIMD control (Single Instruction stream Multi Data stream: a parallel processing control for associatively operating all processor elements by a single program). For example, U.S. Pat. No. 4,939,575, which was issued on Jul. 3, 1990, discloses a configuration of this type of processor.
The above-mentioned processor has an operator array formed of one-bit ALUs (Arithmetic Logic Unit) incorporated in VRAM (Video RAM), as illustrated in FIG. 1. This linear array type processor is described below with reference to FIG. 1. A linear array type processor 1 is generally divided into an input SAM (serial access memory) unit 2, a data memory unit 3, an ALU array unit 4, a data memory unit 5, an output SAM unit 6, and a program control unit 7.
The input SAM unit 2, data memory unit 3, ALU array unit 4, data memory unit 5, and output SAM unit 6 constitute a group of processor elements which are arranged, as a whole, in a linear array having a large number of parallel elements. These multiple processor elements are SIMD controlled in association with each other by a common program control unit located in the program control unit 7.
The program control unit 7 includes a program memory and a sequence control circuit for stepping a program, in order to control the respective other units by a variety of control signals connected thereto in accordance with the program previously stored in the program memory. It should be noted that the input SAM unit 2, data memory unit 3, data memory unit 5, and output SAM unit 6 are basically implemented by memories, and a ROW address decoding program for these memories is assumed to be included in the program control unit 7 in FIG. 1, though detailed explanation is not given herein.
A single element portion in a large number of processor elements arranged in parallel includes a vertically extended area as indicated by hatching in FIG. 1, and such single element portions are linearly arranged in the lateral direction in the figure. In other words, the configuration of a processor as illustrated in FIG. 2, generally required to implement a single processor element, is realized by the vertically extended processor element indicated by hatching in FIG. 1.
The input SAM unit 2 corresponds to an input buffer memory (IQ) 10 in FIG. 2. The output SAM unit 6 corresponds to an output buffer memory (OQ) 11. The data memory unit 5 corresponds to a first data memory (RFB) 12. The data memory unit 3 corresponds to a second data memory (RFA) 13. The ALU array unit 4 corresponds to selectors (SEL) 14A, 14B and ALU 15 for selecting data in the first data memory 12 and the second data memory 13 and performing operations on the selected data as required.
The processor element differs from ordinary processors in that ordinary processors include a hardware configuration for word-unit processing. This processor element includes a hardware configuration for bit-unit processing. Describing the processor element with common expression for CPUs such as 8-bit machine, 16-bit machine, and so on, the processor element may be referred to as a one-bit machine. Since the bit processing processor has a small hardware scale, it can realize an extremely large number of parallelly arranged elements that cannot be realized by an ordinary processor configuration. Thus, a linear array type processor for processing images is provided with the number of parallel elements in a linear array equal to the number H of pixels in one horizontal scanning period of a video signal to be processed.
The configuration of the processor element is schematically illustrated in FIG. 3. One processor element portion in the input SAM unit 2 includes a plurality of input SAM cells 2B arranged in column for receiving a control signal from an input pointer 2A. Actually, the input SAM cells 2B equal in number to the number of bits (ISB) of an input signal DIN in FIG. 1 are arranged in column. However, FIG. 3 omits such a large number of cells and illustrates only one representative cell.
One processor element portion in the data memory unit 3 includes a number of memory cells 3A in FIG. 3 equal to the number of bits MAB in FIG. 1. Actually, the memory cells 3A equal in number to the number of bits MAB are arranged in column. However, FIG. 3 omits such a large number of cells and illustrates only one representative cell. The number MAB of memory cells are provided as work memories required for operational processing.
One processor element portion in the ALU array unit 4 includes an ALU cell 4A in FIG. 3. A net ALU portion in the ALU cell 4A is a one-bit ALU which only requires a circuit scale of a full adder. The ALU cell 4A additionally includes selector circuits SELn each for selecting an input to the ALU 4B, and so on. Each of the selectors SELn selects data from one of buses which intersect therewith at cross points indicated by a plurality of "x" marks in FIG. 3. Each data selected by each selector SELn is supplied to the ALU 4B through a one-bit register FF implemented by a flip-flop.
One processor element portion in the data memory unit 5 includes a number of memory cells 5A in FIG. 3 equal to the number of bits MBB in FIG. 1. Actually, the memory cells 5A equal in number to the number of bits MBB are arranged in column. However, FIG. 3 omits such a large number of cells and illustrates only one representative cell. The number MBB of memory cells are provided as work memories required for operational processing. The memory cell 5A may be common to the memory cell 3A.
One processor element portion in the output SAM unit 6 includes a plurality of output SAM cells 6B arranged in column which receive a control signal from an output pointer 6A. Actually, the output SAM cells 6B equal in number to the number of bits (OBB) of an input signal in FIG. 1 are arranged in column. However, FIG. 3 omits such a large number of cells and illustrates only one representative cell.
An input SAM read signal S.sub.IR, memory access signals S.sub.AA and S.sub.AB, and an output SAM write signal S.sub.OW are conducted on word lines of the memory cells. The word lines laterally pass through the cells and connect identical circuit elements arranged in the lateral direction in a similar manner. It is assumed that the addresses have been decoded for these word lines of memory cells. Also, for a read modify write operation, a read signal is generated in the former half of a cycle and a write signal in the latter half of the cycle.
Also in FIG. 3, connection lines vertically passing through the cells, i.e., bit lines and pointer signal lines pass therethrough as they connect circuit elements arranged in the vertical direction in a similar manner. An input data bus passes through identical circuit elements arranged in the lateral direction, i.e., the input SAM cells 2B as it connects them in a similar manner. An output data bus passes through identical circuit elements arranged in the lateral direction, i.e., the output SAM cells 6B as it connects them in a similar manner.
Next, the operation of the processor will be described with reference to FIGS. 1 and 3. Input data D.sub.IN composed of pixel data of a video signal is led to the input SAM unit 2 through the input data bus. The input pointer 2A generates a one-bit signal, i.e., an input pointer signal S.sub.IP at logical "H" only for a single processor element such that the input data DIN is written into the input SAM cell 2B of the processor element specified by the input pointer signal at logical "H".
In the input SAM cell 2B specified by the pointer 2A, a transistor Tr1 turns on to charge a capacitor C1 to a potential corresponding to the input signal D.sub.IN. It should be noted that while the actual processor includes a number of the input data buses and input SAM cells 2B equal to the number of bits of input signal DIN (ISB), FIG. 3 only illustrates a one-bit portion of these circuit components.
The input pointer signal S.sub.IP at logical "H" is sequentially shifted from the leftmost processor element to the rightmost processor element every one horizontal scanning period of the video signal, such that the input data D.sub.IN can be stored in the input SAM cell 2B in the leftmost processor element and sequentially in the input SAM cells 2B in the processor elements on the right side up the respective previous processor elements. Since the number of the laterally arranged processor elements is equal to the number H of pixels of the video signal in one horizontal scanning period, the video signal is continuously written into the SAM cells 2B in the right direction during one horizontal scanning period with a clock commensurate with the data rate of the input video signal. Thereby, the input data DIN of one horizontal scanning period portion can be accumulated in the input SAM unit 2. This input operation is repeated every horizontal scanning period.
Every time the data including one horizontal scanning period portion of the video signal is accumulated in the input SAM unit 2 as described above, the program control unit 7 SIMD controls the input SAM unit 2, data memory unit 3, ALU array unit 4, data memory unit 5, and output SAM unit 6 in the following manner to execute the processing. This program control is repeated every horizontal scanning period. In other words, the program control unit 7 can provide a program having a number of steps calculated by dividing the horizontal scanning period by the instruction cycle period of the processor. Since the SIMD control is performed, the following operations are all executed in all the processor elements at the same time.
The one horizontal scanning portion of input data DIN accumulated in the input SAM unit 2 is transferred from the input SAM unit 2 to the data memory unit 5 under the control of the program control unit 7, as required, during the next horizontal scanning period and then used for operational processing. The transfer of the input data DIN from the input SAM unit 2 to the data memory unit 5 is realized by a program which selects required bits stored in the input SAM unit 2 by the input SAM read signal S.sub.IR and generates the memory access signal S.sub.AB to predetermined memory cells 5A in the destination data memory unit 5 to write the selected bits thereinto.
While the input SAM read signal S.sub.IR and the memory access signal S.sub.AB are included in word lines and there are pluralities of the input SAM read signal S.sub.IR and the memory access signals S.sub.AB, they have been decoded by address decoders. Also, for a read modify write operation, a read signal is generated in the former half of a cycle and a write signal in the latter half of the cycle.
In an input SAM cell 2B selected by the input SAM read signal S.sub.IR, a transistor Tr2 turns on in the former half of a cycle, so that a transfer data signal corresponding to a potential on the capacitor C1 occurs on an upper bit line which vertically passes through the input SAM cell 2B. This data transfer is performed one bit per cycle through the vertical bit line. During the data transfer, the transferred data is passed through the ALU cell 4A although the ALU 4B has nothing to process the transferred data. In other words, each selector SEL selects a path such that transferred data passes through the ALU 4B, while a no-operation instruction is generated to the ALU 4B. Then, the ALU output control signal S.sub.BB is generated at predetermined timing to turn on a transistor Tr5 to output an ALU output onto a lower bit line in the latter half of the cycle.
A transistor Tr6 in a predetermined memory cell 5A in the data memory unit 5 selected by the memory access signal S.sub.AB is turned on in the latter half of the cycle to charge a capacitor C3 to a potential corresponding to the transferred data so that the transferred data having passed through the ALU 4B is stored in the memory cell 5A.
The read signal S.sub.IR from each input SAM cell 2B in the input SAM unit 2 and the memory access signal S.sub.AA to each memory cell 3A in the data memory unit 3 are located in the same address space, decoded by the same ROW decoder for the memories, and delivered onto word lines.
For processing the data, the memory access signals S.sub.AA, S.sub.AB are first generated to predetermined memory cells in the data memory unit 3 and the data memory unit 5, as required, to read and write data in order to move the data therebetween, as a preoperative operation for the data processing.
For example, if data is to be transferred from the data memory unit 5 to the data memory unit 3, the read memory access signal S.sub.AB is outputted to a predetermined memory cell 5A in the data memory unit 5 to turn the transistor Tr6 on during the former half of a cycle. Thus, data to be transferred, corresponding to a potential on the capacitor C3, is outputted onto a lower bit line. Then, the ALU array unit 4 is controlled to pass the transferred data through the ALU cell 4A, although the ALU 4B does not perform any processing on the transferred data, in a manner similar to the data transfer from the input SAM unit 2 to the data memory unit 5. Next, the ALU output control signal S.sub.BA is generated at predetermined timing to turn a transistor Tr4 on to output the transferred data onto an upper bit line during the latter half of the cycle. Then, a write memory access signal S.sub.AA is outputted to a predetermined memory cell 3A in the data memory unit 3 to turn the transistor Tr3 on during the latter half of the cycle, thereby charging the capacitor C2 to a potential corresponding to the transferred data.
In this way, the input data D.sub.IN which has been written in the past as described above and data in the middle of operation are stored in the data memory unit 3 and the data memory unit 5. These data or data stored in the one-bit registers FF in the ALU cell 4A are used to sequentially advance required bit-by-bit operational processing in the ALU 4B.
For example, for adding data in a memory cell 3A of a bit in the data memory unit 3 and data in a memory cell 5A of a bit in the data memory unit 5, and writing the addition result into the memory cell 5A from which the bit of the data memory unit 5 has been read, the following processing is performed.
First, in the former half of a cycle, the read signal S.sub.AA is outputted to a memory cell 3A associated with a predetermined bit of the data memory unit 3, and the read signal S.sub.AB is outputted to a memory cell 5A associated with a predetermined bit of the data memory unit 5. As a result, transistors Tr3 and Tr6 in both the memory cells are turned on and data stored therein are outputted onto respective bit lines.
The data read from the data memory unit 3 and the data read from the data memory unit 5 pass through a predetermined path selected by the selector SEL in the ALU array unit 4 and in the ALU 4B, an addition is performed. The output from the ALU 4B generates an ALU output control signal S.sub.BB at predetermined timing to turn the transistor Tr5 on to output it onto a lower bit line in the latter half of the cycle as the resultant data. Then, the write memory access signal S.sub.AB is outputted to a predetermined memory cell 5A in the data memory unit 5 to turn the transistor Tr6 on in the latter half of the cycle, charging the capacitor C3 to a potential corresponding to the data outputted from the ALU 4B.
The processing operation in the ALU cell 4A is specified from a program by an ALU control signal S.sub.ALU-COUNT. The result of the operation performed in the ALU cell 4A may be again written into the data memory unit 3 or into the data memory unit 5, or stored in a one-bit register FF in the ALU cell 4A as required. In the case of addition, generally, a carry is stored in the one-bit register FF, while the sum is stored in the data memory unit 5.
In this way, necessary arithmetic operations or logical operations are performed in the ALU array unit 4 in each time when data are read from the data memory unit 3 and the data memory unit 5, which are located above and below the ALU cell 4A, in accordance with the program. The operation data may be again written into a predetermined address in the data memory unit 3 or the data memory unit 5. The operational processing is fully performed on a bit-by-bit basis, and the processing is advanced one bit per cycle.
When the operational processing to be conducted during one horizontal scanning period has been completed, the processed output data in the horizontal scanning period must be transferred to the output SAM unit 6 in a last portion of the program within the same horizontal scanning period.
If data to be outputted is located in a predetermined memory cell 3A of the data memory unit 3, the memory access signal S.sub.AA is outputted to the memory cell 3A in the former half of a cycle to read the data. Then, the write signal S.sub.OW is generated to an output SAM cell 6B in the latter half of the cycle such that the read data is passed through the ALU array unit 4 and transferred to the output SAM cell 6B with a predetermined bit in the output SAM unit 6. The data is transferred one bit by one bit through a vertical bit line. Also in this event, although the data is not performed any processing associated with the data transfer in the ALU 4B, the data is passed through the ALU cell 4A. For this operation, the ALU output control signal S.sub.BB is generated at predetermined timing. Since details on the operation are the same as the above, description thereon is omitted.
The write signal S.sub.OW to each output SAM cell 6B in the output SAM unit 6 and the memory access signal S.sub.AB to each memory cell 5A in the data memory unit 5 are located in the same address space, so that they are decoded by the same ROW decoder and provided on associated word lines.
As described above, the transfer of input data D.sub.IN accumulated in the input SAM unit 2 to the data memory units 3, 5, required data transfer between the data memory units 3, 5, required operational processing, and transfer of output data to the output SAM unit 6 are controlled during one horizontal scanning period by the bit-based SIMD control program. The processing performed by the program is repeated every horizontal scanning period. Since the processing is SIMD controlled, all the processor elements operate in association with each other to perform the same processing on the number H of pixels in one horizontal scanning period.
The output data transferred to the output SAM unit 6 when the program has completed the processing, is again outputted from the output SAM unit 6 during the next horizontal scanning period in the following manner.
The output data is led to an output data bus from the output SAM unit 6 and outputted external to the processor. The output pointer 6A generates a one-bit signal at logical "H", i.e., an output pointer signal S.sub.OP only to a single processor element. The output data is read onto the output data bus from an output SAM cell 6B of the processor element specified by the output pointer signal S.sub.OP at logical "H" and serves as output data S.sub.OUT. While there are a number of the output data buses and the output SAM cells 6B equal to the number of bits OSB, FIG. 3 only illustrates one bit portion of these components.
In the output SAM cell 6B specified by the output pointer 6A, a transistor Tr8 turns on to generate an output signal corresponding to a potential on a capacitor C4 on the output data bus. The output pointer signal S.sub.OP at logical "H" is sequentially shifted from the leftmost processor element to the rightmost processor element every one horizontal scanning period of the video signal, such that the reading of the output data, beginning with the output SAM cell 6B in the leftmost processor element, is shifted sequentially to the processor elements on the right side of the respective previous processor elements. Since the number of the laterally arranged processor elements is equal to the number H of pixels of the video signal in one horizontal scanning period, the output data D.sub.OUT of one horizontal scanning period portion can be outputted from the output SAM unit 6 at a clock commensurate with the data rate of an output video signal. This output operation is repeated every horizontal scanning period. It should be noted that the configuration of each cell in FIG. 3 is highly generalized for facilitating the understanding.
In a program control processor, as described above with reference to FIG. 2, generally referred to as CPU or DSP, input data D.sub.IN is first written into a data memory unit 12 through a buffer memory 10. Then, data just inputted and written into the data memory unit 12, previously inputted data stored in the data memory unit 12 and in the data memory unit 13, previous processed data, data in the middle of processing, and so on are selected by memory addresses and the selectors 14A, 14B, thus these data are led to the ALU 15 for processing. Then the processed data are stored in the data memory unit 12 and in the data memory unit 13. Then, the processing result is outputted from the data memory unit 12 through the output buffer memory 11.
In the linear array type processor 1 illustrated in FIG. 1, the input SAM unit 2 corresponds to the input buffer memory 10; the output SAM unit 6 to the output buffer memory 11; the data memory unit 5 to the data memory unit 12; the data memory unit 3 to the data memory 13; and the ALU array unit 4 to the selectors 14A, 14B and the ALU 15, respectively.
In the linear array type processor 1, an input operation by writing input data D.sub.IN into the input SAM unit 2 is designated a first operation; transfer of the input data D.sub.IN accumulated in the input SAM unit 2 into the data memory units 3, 5, required data transfer between the data memory units 3, 5, required operational processing, and transfer of output data D.sub.OUT to the output SAM unit 6 are collectively designated a second operation; and an output operation by reading the output data D.sub.OUT from the output SAM unit 6 is designated a third operation. These three operations form a pipe line operation performed every horizontal scanning period of a video signal. While the respective operations are executed one by one for input data D.sub.IN of a particular horizontal scanning period with a time shift equal to one horizontal scanning period, the three operations can be simultaneously advanced in parallel.
The conventional processor is composed of the input SAM unit, first data memory unit, second data memory unit, and output SAM unit as described above. In consideration of the performance of the processor thus configured, even if the vertical length in FIG. 1 is extended, i.e., the memory sizes of the input SAM unit 2, data memory unit 3, ALU array unit 4, data memory unit 5, and output SAM unit are increased, address spaces of the respective data memories are merely extended to provide larger working memories.
On the other hand, even if the lateral length in FIG. 1 is extended, i.e., the number of parallel processor elements is increased, an increased number of parallel processor elements does not contribute to the performance of the processor since the parallel processor elements are used in correspondence to the number of pixels in one horizontal scanning period of a video signal to be applied thereto.
Thus, to improve the performance of the processor thus configured, such methods are considered as faster instruction cycles, parallel configuration of ALUs, or parallel configuration of the whole processor, however, these methods have problems in hardware.
Also, since the parallel processor elements in the processor having the above described architecture are used in correspondence to the number of pixels in one horizontal scanning period of a video signal to be applied thereto, the processor has a problem in view of versatility in that the number of the parallel processor elements cannot be corresponded to any number of pixels in one horizontal scanning period. Specifically, if the number of parallel processor elements is matched with the number of pixels in one horizontal scanning period of a particular image format, the parallel processor elements may be excessive or lacking for different image formats. Thus, the parallel processor elements cannot always be utilized satisfactorily.