1. Field of the Invention
This invention relates to a video signal processor and is applicable to the DSP-LSI (digital signal processor--large scale integrated circuit) chips for television devices, video tape recorders, set top boxes, multimedia computers, and broadcasting equipment.
2. Description of the Related Art
Heretofore, as the construction of a processor which programmably realizes digital signal processing of video signal such as a television signal, there has been a linear array type processor which affects the SIMD control (single instruction stream multi data stream: parallel processing control which operates all processor elements interlocking by means of one program). For example, the construction of this type of processor is disclosed in the U.S. Pat. No. 4,939,575.
As shown in FIGS. 1-3, this processor 1 has a form in which an arithmetic array of 1 bit ALU (arithmetic logic unit) is built into the VRAM (video RAM). The linear type processor 1 can be roughly divided into input SAM (serial access memory) unit 2, data memory unit 3, ALU array unit 4, data memory unit 5, output SAM unit 6 and program control unit 7.
The input SAM unit 2, the data memory unit 3, the ALU array unit 4, the data memory unit 5 and the output SAM unit 6 form altogether a group of linear array type multi-parallelized processor elements, and a number of processor elements are SIMD controlled in an interlocking manner by one common control program in the program control unit 7.
The program control unit 7 comprises a sequence control circuit for the program memory and an incrementable counter value for address generation and controls each part by various control signals connected to other parts according to the program written in the program memory in advance. The input SAM unit 2, the data memory unit 3 and 5, and the output SAM unit 6 are basically memories (VRAMs).
A single element portion of a multi-parallelized processor is defined by a vertically elongated area as shown by the oblique lines in FIG. 1 aligning the horizontal linear array in this Figure. More specifically, the vertically elongated processor elements shown by the oblique lines in FIG. 1 represent respectively the general processor construction shown in FIG. 2 which is necessary to construct one processor element.
The input SAM unit 2 corresponds to an input buffer memory (IQ) 10 of FIG. 2. The output SAM 6 corresponds to an output buffer memory (OQ) 11. The data memory unit 5 corresponds to the first data memory (RFB) 12. The data memory unit 3 corresponds to the second data memory (RFA) 13. The ALU array unit 4 corresponds to selectors (SEL) 14A, 14B and ALU 15 for operating upon the selected data of the first data memory 12 and the second data memory 13 as occasion demands.
The difference between the processor element shown in FIG. 1 and the normal processor is that in a normal processor its hardware is word by word processor which processes per word. However, in the case of this processor, its hardware is bit-wise processor which processes on a per bit basis. This processor can be defined as a 1 bit machine, if the way of expression to be used in the normal CPU, such as 8-bit machine or 16-bit machine is applied. The hardware of the bit processing processor is small and since it can contain a large number of parallelism, in the case for video, the number of linear array parallelism of processor elements is in agreement with the number of pixels H of one horizontal scanning interval period of video signal.
The general construction of this processor element is shown in FIG. 3. In this connection, the construction of each cell of FIG. 3 is shown as a general one in order to facilitate understanding. One processor element portion of the input SAM unit 2 of FIG. 1 is defined in FIG. 3 by multiple input SAM cells 2B aligned vertically and to be controlled by the input pointer 2A. The input SAM cell 2B will be provided vertically aligned for the number of bits (ISB: the number of frontage bits of the input SAM unit) of input signal DIN of FIG. 14,. however, FIG. 3 omits these and one model cell is represented in the Figure.
Regarding one processor element portion of the data memory unit 3, the memory cell 3A in FIG. 3 is provided for the number of MAB bits (MAB is the number of bits of a memory A in the column (vertical) direction) in FIG. 1 and vertically aligned, but FIG. 3 omits these and one cell is representative. The MAB will be provided as many as required as operational memories.
One processor element portion of the ALU array unit 4 is the ALU cell 4A in FIG. 3. Here, the ALU part in the ALU cell 4A is a 1 bit ALU 4B and this is the circuit scale of about the same level as a full adder. In addition to the above, in the ALU cell 4A, selector circuits SEL2-SEL4 for selecting inputs of the ALU 4B are provided. Selectors SEL1-SEL5 select the data from one of buses intersecting with buses shown by X marks in FIG. 3. The data selected by the prescribed selectors SEL2-SEL4 will be given to the ALU 4B through Flip-flop constructed 1-bit registers M--FF3.
Regarding one processor element portion of the data memory unit 5, memory cells 5A of FIG. 3 are provided for the number of MBB bits (MBB is the number of bits of a memory B in the column direction) of FIG. 1 and vertically arrayed. However, FIG. 3 shows one cell as representative. The MBB will be prepared as many as required as operational memories. Also, the memory cells 5A and 3A may be the same.
One processor element portion of the output SAM unit 6 is represented by vertically aligned multiple output SAM cells 6B to be controlled by the output pointer 6A. The output SAM cell 6B will be provided as many as for the number of bits of output signal (OSB: the number of frontage bits of the output SAM unit) in FIG. 1 aligned vertically, however, FIG. 3 shows one representative cell and omits the others.
Input SAM read-out signal SIR, memory access signals SAA and SAB and output SAM write-in signal Sow are word lines of memory cells, and as well as passing through cells horizontally, pass through connecting the same circuit elements arrayed horizontally as well. The word lines of these memory cells are address decoded. Also, for read-modified write, the signal for read-out is generated at the timing of the first half of a cycle and the signal of write-in is generated at the timing of the latter half of a cycle.
Furthermore, in FIG. 3, connection lines which pass through the cells vertically, i.e., bit lines and pointer signal lines, pass through connecting the circuit elements arrayed vertically in the same manner. The input data bus passes through connecting the same circuit elements arrayed horizontally, i.e., input SAM cell 2B, respectively in the same manner. The output data bus passes through connecting the same circuit elements arrayed horizontally, i.e. output SAM cell 6B, respectively as well.
Then, the operation of this processor will be explained referring to FIGS. 1 and 3 as follows. Input signal DIN is led to the input SAM unit 2 through the input data bus. The input pointer 2A generates a 1-bit signal which is logical "H" to only one processor element, that is input pointer signal S.sub.IP, and input data DIN is written in the input SAM cell 2B of the processor element assigned by the logic "H".
In the input SAM cell 2B assigned by the pointer, transistor Tri changes to on and capacitor C1 becomes the electric potential corresponding to input signal DIN. Input data bus and input SAM cell 2B exist for ISB bits respectively, but, FIG. 3 shows only for 1 bit .
Since the logic "H" signal is moved successively from the left end processor element to the right end processor element in every horizontal scanning interval period of video signal by the input pointer signal S.sub.IP, the input data DIN can be sequentially stored in the respective capacitors C1 from the input SAM cell 2B of the left end processor element to the SAM cell 2B of the right end processor element. And since the number of processor elements arrayed horizontally is the same as for the number of pixels H of video signal of one horizontal scanning period, by continuing the SAM write-in for one horizontal scanning period in the rightward direction with the clock corresponding to the data rate of the input video signal, the input data DIN for one horizontal scanning period can be stored in the input SAM unit 2. These input operations will be repeated in every horizontal scanning period.
Thus, each time the data of video signal of one horizontal scanning period is stored in the input SAM unit 2, the program control unit 7 SIMD controls the input SAM unit 2, data memory unit 3, ALU array unit 4, data memory unit 5 and output SAM unit 6 and executes the programmed controlled processing. This program control will be repeated in every horizontal scanning period. More specifically, the programs for the number of steps wherein the horizontal scanning period time is divided by the command cycle interval of this processor can be programmed. And since this is SIMD controlled, the following operations can be executed simultaneously at all processor elements.
The input data DIN stored in the input SAM unit 2 for one horizontal scanning period is transferred to the data memory unit 5 from the input SAM unit 2 under the control of the program control unit 7 as necessary in the following one horizontal scanning interval and will be used for the following arithmetic processing. This transfer operation from the input SAM unit 2 to the data memory unit 5 will be executed by selecting the memory content of the necessary bits of the input SAM unit 2 using the input SAM read-out signal SIR and accessing and writing in by outputting the memory access signal SAB to the prescribed memory cell 5A of the destination data memory unit 5.
Here, the input SAM read-out signal SIR and memory access signal SAB are word lines and there exist multiple numbers respectively but these are decoded by the address decoder. Because these memory access are read-modified write mode operation, a signal for read-out is generated at the first half of a cycle and a signal for write-in will be generated at the latter half of a cycle.
In the input SAM cell 2B selected by the input SAM read-out signal SIR, transistor Tr2 becomes-on at the first half of the cycle and transmission data signal corresponding to the electric potential of capacitor C1 is produced on the upper side of vertical bit line. This data transfer is conducted one bit per one cycle through the vertical bit lines. In transferring the data, the ALU 4B has nothing to process but makes to pass through the ALU cell 4A. More specifically, in this cycle, each selector SEL1-SELS selects the path so that the transfer data passes through the ALU 4B and a command of no arithmetic operation is sent to the ALU 4B. Then ALU output control signal SBB is generated at the fixed timing and the transistor Tr5 turns on and then the ALU output is outputted to the lower part bit line at the latter half of the cycle.
In the transfer data passed through the ALU 4B, transistor Tr6 of the prescribed memory cell 5B of the data memory unit 5 selected by the memory access signal SAB turns on in the latter half of the cycle and capacitor C3 changes to the electric potential corresponding to the transfer data.
The read-out signal SIR to each input SAM cell 2B of the input SAM unit 2 and the memory access signal SAA to each memory cell 3A of the data memory unit 3 are in the same address space and decoded at the ROW decoder with the same memory and given as word lines.
At the time when arithmetically processing the data, if both two data necessary for arithmetic are extant in either of the data memory 3 or the data memory 5, the arithmetic operation cannot be started at once, therefore, firstly for its preparation, by outputting the memory access signals SAA and SAB to the prescribed memory cell between the data memory unit 3 and the data memory unit 5 -according to demands, write-in and read-out are conducted and the data are transferred.
For example, in the case of transferring from the data memory unit 5 to the data memory unit 3, the read-out memory access signal SAB is outputted to the prescribed memory cell 5A of the data memory unit 5 and after putting the transistor Tr6 in its on condition at the first half of the cycle, the transferring data corresponding to the electric potential of the capacitor C3 is outputted to the lower bit line. There is nothing to process at the ALU 4B as in the case of data transfer to the data memory unit 5 from the input SAM unit 2. However, the ALU array unit 4 is controlled in order to pass the data through the ALU cell 4A, the ALU output control signal SEA is generated at the fixed timing and the transistor Tr4 is put in the `on` condition and the transferring data is outputted to the upper bit line at the latter half of the cycle. Then, outputting the write-in memory access signal SAA to the prescribed memory cell 3A of the data memory unit 3, the transistor Tr3 is put on at the latter half of the cycle, and the capacitor C2 changes to the electric potential corresponding to the transfer data.
With this arrangement, the input data DIN written in the past as described above, and the data which is being operated upon are recorded on the data memory unit 3 and the data memory unit 5. In utilizing these data and the data stored in the 1-bit register FF in the ALU cell 4A, the necessary arithmetic processing per bit can be successively conducted in the ALU 4B.
For example, the case of writing in the addition result to the memory cell 5A of the bit just read out now in the data memory unit 5 by adding the data of memory cell 3A of bit in the data memory unit 3 and the data of memory cell 5A of bit in the data memory unit 5 will be explained as follows:
More specifically, the read-out signal SAA is outputted to the memory cell 3A having the fixed bits in the data memory unit 3 and the read-out signal SAB is outputted to the memory cell 5A with the fixed bits in the data memory unit 5 in the first half of the cycle, and putting transistors Tr3 and Tr6 of both memory cells on condition, data will be outputted to respective bit lines.
As to the data read out from the data memory unit 3 and the data read out from the data memory unit 5, the program control unit 7 makes the selector SEL of the ALU array unit 4 select the prescribed path and makes the ALU 4B to conduct an addition operation. The resultant data of the arithmetic operation of the ALU 4B will be outputted to the lower bit line at the latter half of the cycle by generating the ALU output control signal SBB at the fixed timing and turning the transistor Tr5 on. Then, the write in memory access signal SAB is outputted to the prescribed memory cell 5A of the data memory unit 5 and turning the transistor Tr6 on at the latter half of the cycle and the capacitor C3 changes to the electric potential corresponding to the ALU output data.
The arithmetic operation in this ALU cell 4A will be assigned by ALU control signal SALU-CONT from the program. The result of arithmetic operation at the ALU cell 4A can be written in again either in the data memory unit 3 or the data memory unit 5, or can be stored in the 1 bit register FF in the ALU cell 4A as necessary. In the case of addition, most commonly, the carry is led to the 1 bit register FF and the sum is led to the data memory unit 5.
With this arrangement, the data can be read out from the data memory unit 3 and the data memory unit 5 arranged on the upper side and lower side of the ALU cell 4A corresponding to the programs, and upon conducting the necessary arithmetic operation or logical operation at the ALU array unit 4, the data can be written again in the prescribed address of the data memory unit 3 or the data memory unit 5. These arithmetic processings are all bit processing and can be processed 1 bit per cycle.
At this point, when the arithmetic processing supposed to be processed in the one horizontal scanning period is complete, it is necessary to transfer the output data which is already through the arithmetic processing to the output SAM unit 6 within its horizontal scanning period at the final part of the program.
In the case where the data supposed to be outputted at present exists in the data memory unit 3, memory access signal SAA is led to the prescribed memory cell 3A at the first half of a cycle and read out and passed through the ALU array unit 4, and the write-in signal S.sub.OW is outputted to the output SAM cell 6B at the latter half of a cycle in order that the data will be transferred to the output SAM cell 6B with the fixed bits of the output SAM unit 6. The data will be transmitted one bit by one bit through the bit lines in the vertical direction. At this point, there is nothing to process in the ALU 4B in case of transferring the data, but since the data is to pass through the ALU cell 4A, the ALU output control signal SBB will be generated at the fixed timing. Detailed operational descriptions will be omitted since the operation is identical to that of the above.
The write-in signal S.sub.OW to each output SAM cell 6B in the output SAM unit 6 and the memory access signal SAB to each memory cell 5A in the data memory unit 5 are in the same address space and will be decoded by ROW -decoders having the same memory and given as word lines.
As described above, in one horizontal scanning interval time, the transfer of the input data DIN stored in the input SAM unit 2 to the data memory units 3 and 5, data transfer between the required data memory units 5, the necessary arithmetic processing and the output data transfer to the output SAM unit 6 are controlled by the SIHD control program which effects processing on a bit-wise basis. This program processing will be repeated processing the horizontal scanning period as an unit. Since all processor elements operate in an interlocking manner, the same processing will be executed with respect to the number of pixels H for the horizontal scanning period.
The output data transferred to the output SAM unit 6 after the above program processing is complete will be further outputted from the output SAM unit 6 in the following horizontal scanning period as follows.
The output data is led to the output data bus from the output SAM unit 6 and outputted externally of the processor 1. The output pointer 6A generates 1 bit signal which is logical "H" only to one processor element, i.e., output pointer signal SOP, and the output data is read out to the output data bus from the output SAM cell 6B of the processor element assigned by the logic "H" and becomes to be output data DOUT. The output data bus and output SAM cell 6B exist OSB bits respectively, however, FIG. 3 shows only for 1 bit .
In the output SAM cell 6B assigned by the output pointer 6A, the transistor Tr8 turns on and the output signal corresponding to the electric potential of the capacitor C4 will be obtained in the output data bus. Since the "H" signal according to the output pointer signal SOP moves from the left end to the right end processor element in every one horizontal scanning period of the video signal, the read-out of output data moves successively from the output SAM cell 6B of the left end processor element to the output SAM cell 6B of the processor element in the right direction. Here, because the number of processor elements aligned horizontally are for the number of pixels H of one horizontal scanning period of video signal, the output data for one horizontal scanning period can be outputted from the output SAM unit 6 with the clock corresponding to the data rate of output video signal. These output operations will be repeated in every horizontal scanning period.
As described above on the program control processor called generally as CPU or DSP in FIG. 2, firstly the input data DIN is written in the data memory unit 12 via the input buffer memory 10 and the data just written in the data memory unit 12 and the data inputted before or the data in the data memory unit 12 or data memory unit 13, which is arithmetically processed before and being processed will be selected at the memory address and selectors 14A and 14B and led to the ALU 15 and arithmetically operated upon, and again stored in the data memory unit 12 and/or data memory unit 13. Then, the arithmetic processing result will be outputted from the data memory unit 12 through the output buffer memory 11.
In a linear array type processor 1 of FIG. 1, an input SAM unit 2 which corresponds to the input buffer memory 10, an output SAM unit 6 which corresponds to the output buffer memory 11, a data memory unit 5 corresponding to the data memory 12, a data memory unit 3 corresponding the data memory 13, and an ALU array unit 4 corresponding to selectors 14A and 14B and ALU 15 are provided.
Furthermore, in the linear array type processor 1, assuming that the input operation by write-in of input data DIN into the input SAM unit. 2 to be the first operation, the transfer of input data DIN stored in the input SAM unit 2 to data memory units 5, the transfer of data between data memory units 3 and 5 and the necessary arithmetic operations and the transfer of the output data DOUT to the output SAM unit 6 to be the second operation, and the output operation by read-out of output data DOUT to the output SAM unit 6 to be the third operation, these three operations are interconnected like a so called pipe-lining operation making one horizontal scanning period of the video signal as a unit, and with respect to the input data DIN of one horizontal scanning period, each operation would be executed in the form of delaying by one horizontal scanning period time, thus three operations can be continuously processed simultaneously.
In the conventional processor 1 formed by an architecture as shown in FIG. 1, for example, if the length in the vertical direction were extended in FIG. 1, memory sizes of the input SAM unit 2, data memory unit 3, data memory unit 5, and output: SAM unit 6 are increased, the address space of each data memory would be enlarged increasing the working memory whereas its operational performance would not change at all. Moreover, if the length in the horizontal direction were extended, i.e., the number of parallel processor elements were increased, it has no effect on the device since the parallelism number of processor elements will be used corresponding to the number of pixels of one horizontal scanning period of video signal to be applied.
The only way to improve the operational performance of the processor having this kind of architecture is to increase its command cycle, to parallelize the ALU, or to parallelize the whole processor system.
When each memory cell of input SAM cell 2B of the input SAM unit 2, memory cell 3A of the data memory unit 3, memory cell 5A of the data memory unit 5 and output SAM cell 6B of the output SAM unit 6 is formed by DRAM construction (dynamic random access memory), the access time is slow and it becomes a disadvantage in increasing the operating speed of memory. However, if it is not formed by DRAM, the memory size becomes large. Moreover, since the read modified write conducts both reading and writing operations within one cycle, its processing speed becomes slow.
Furthermore, since the command cycle in the processor 1 is a period from the time when data is read out from two data sources and arithmetically processed at the ALU 4B till it is written in the data destination and since the data passes through the ALU 4B in the course of its operations, the data processing path is long and speeding up the operation is difficult.
In the construction of FIGS. 1 and 3, if the ALU 4B were parallelized in order to improve the operational performance of ALU cell 4A of the ALU array unit 4, the width is extended, the length becomes too long and the dimensional balance of a processor element becomes unbalanced and space would be wasted since the element of the processor 1 is physically composed with very narrow shape, approximately the same width of a memory cell. Moreover, by parallelizing the whole processor 1, if the processing capacity increases N times by parallelizing N numbers, the hardware scale also increases N times.
Furthermore, the processor 1 comprises input SAM unit 2, data memory unit 3, ALU array unit 4, data memory unit 5 and output SAM unit 6 which are arranged in that order and each input SAM cell 2B of the input SAM unit 2 and each memory cell 3A of the data memory unit 3 are in the same address space and decoded by the ROW decoder with the same memory. Also, each output SAM cell 6B of the output SAM unit 6 and each memory cell 5A of the data memory unit 5 are in the same address space and decoded by the ROW decoder different from the ROW decoder described above.
Accordingly, restrictions have been imposed in the case of transferring the data to the data memory from the input SAM unit 2, such as the data destination must be the data memory unit 5 of the other side with the ALU array unit 4 between, or in the case of transferring the data from the data memory to the output SAM unit 6, its data source must be the data memory unit 3 of the other side with the ALU array unit 4 between.
Furthermore, in the case of arithmetically processing two pieces of data in the data memory unit 3, there was a restriction that the arithmetic processing must be started after one of the data in the data memory unit 3 had been transferred to the data memory unit S.
Moreover, there are MAB and MBB bit numbers of memory cells in the data memory unit 3 and data memory unit 5 respectively. However, since the data memory is divided into both sides of the ALU-array unit 4, in the case of processing some application under a certain condition, there are cases where the data memory unit 3 is short of memory capacity because almost all memory cells have been used up, whereas a number of memory cells have not been used in the data memory unit 5, and the memory address space has not been utilized effectively.