The present invention relates to a digital signal processing apparatus and a digital signal processing method applicable to a picture processing apparatus for performing a picture enlarging process and a picture reducing process.
In the field of a conventional picture signal process, the same calculation is performed for all pixels of one picture. To perform the same calculation for many pieces of data at high speed, SIMD (Single Instruction Multiple Data Stream) architecture has been proposed. Thus, the SIMD architecture has been used in various fields as well as the field of the picture signal process. In the SIMD architecture, a required number of calculating devices are disposed and operated corresponding to the same instruction. Thus, when different data is input to individual calculating devices, they output respective results corresponding to the input data.
An SIMD processing device applied for a picture process has been disclosed by for example Kurokawa et al. in xe2x80x9c5. 4 GOPS Linear Array Architecture DSP for Video Format Conversion nxe2x80x9d, IEEE February/1996 ISSCC, FP15. 7. The device is a parallel processor as shown in FIG. 18.
The device shown in FIG. 18 is composed of an input picture data 1, an input frame memory 2, SIMD picture processors (parallel processors) 3a and 3b, an output frame memory 14, and an output picture data 15. Each of the parallel processors 3a and 3b is composed of an input pointer 4, an input SAM (Serial Access Memory) portion 5, a data memory portion 7, an ALU array portion 8, an output SAM portion 9, an output pointer 11, a program controlling portion 12, and so forth.
The input SAM portion 5, the data memory portion 7, the ALU array portion 8, and the output SAM portion 9 compose a linear array element processor group. The many element processors are controlled (SIMD-controlled) by the common program controlling portion 12 as a program control function. The program controlling portion 12 contains a program memory and a sequence controlling circuit that controls a program stored in the memory. The program controlling portion 12 generates various control signals for individual portions corresponding to the program stored in the program memory.
The program controlling portion 12, the data memory portion 7, and the ALU array portion 8 compose a processor block. When processor blocks are disposed in many stages, the process performance improves corresponding to the number of stages. In FIG. 18, each processor block is an SIMD processing device. However, the entire apparatus composed of individual processor blocks is an MIMD (Multiple Instruction Multiple Data Stream) processing device that can process a plurality of programs in parallel.
A conventional processor processes data word by word. However, in one element processor represented by a rectangular area (hatched area) of FIG. 18, the input SAM portion 5, the data memory portion 7, and the output SAM portion 9 correspond to a xe2x80x9ccolumnxe2x80x9d of a memory. In addition, the ALU array portion 8 is a one-bit ALU. Thus, actually, the element processor is a circuit based on a full-adder. Thus, the element processor is a bit-processor unlike with a conventional processor. The element processor is a one-bit machine in the sense of a CPU as in an eight-bit machine or a 16-bit machine. Since the hardware scale of a bit-processor is small and many parallel processes that are not conventionally available can be accomplished, the number of element processors linearly arrayed matches the number of pixels (H) in one horizontal period of a picture signal.
The processor 3a shown in FIG. 18 performs a picture process in the following manner. In a horizontal scanning active period, input data for one horizontal scanning line is stored to the input SAM portion 5. In a horizontal scanning blanking period, data is transferred from the input SAM portion 5 to the data memory portion 7. The data memory portion 7 and the ALU array portion 8 perform calculating processes corresponding to the program. After the calculating processes have been completed, the processed results of the data memory portion 7 and the ALU array portion 8 are transferred to the output SAM portion 9. In the horizontal scanning active period, the data for one horizontal scanning line is output from the output SAM portion 9. In the above-described processes, each portion operates in parallel.
When the processor 3a performs a picture process, the size of a picture depends on the number of element processors of the processor 3a. When a picture whose size exceeds the number of element processors, as shown in FIG. 18, two or more processors that are the same processors such as processors 3a and 3b are disposed and data that is input to the input SAM portion 5 of each processor is controlled through the frame memory 2. Thus, a complicated hardware structure is required.
When a processor performs a picture process (in particular, a pixel number converting process), the sizes of input/output pictures should be considered. When the sizes of the input/output pictures are smaller than the number of element processors, one processor is used. In contrast, when the sizes of the input/output pictures are larger than the number of element processors, a plurality of processors are used.
In recent years, a picture process with a resolution (namely, the number of pixels is larger than the number of element processors) has been required. In FIG. 18, to accomplish such a picture process, two processors 3a and 3b are used. However, since a plurality of processors are used, the hardware becomes complicated and large. In addition, the cost of the apparatus becomes high. Moreover, when a pixel number converting process is performed, a circuit that connects a plurality of processors is important. However, this circuit becomes complicated.
Therefore, an object of the present invention is to provide a digital signal processing apparatus and a digital signal processing method for use with a picture processing apparatus that allows such a problem to be solved and a simple and inexpensive processor to be used.
The present invention is a digital signal processing apparatus for parallel executing a plurality of data processes with a single common command, comprising a plurality of input storing means, each of which is composed of a plurality of storing elements, an input controlling means for controlling the input storing means, a calculating means, having a plurality of element calculating means corresponding to the plurality of the storing elements of the input storing means, for parallel calculating data stored in each storing element of the input storing means, a data storing means, having a plurality of storing elements corresponding to the plurality of element calculating means of the calculating means, for storing calculated result data of the element calculating means corresponding to the storing elements, a plurality of output storing means, each of which is composed of a plurality of storing elements corresponding to the plurality of element calculating means of the calculating means, for storing the calculated result data, an output controlling means for controlling the output storing means, and a controlling means for controlling the input storing means, the calculating means, the data storing means, and the output storing means corresponding to a control program.
The present invention is a digital signal processing method, comprising the steps of (a) separating a sequence of data into at least two sets, (b) parallel calculating each separated data in common, (c) storing the calculated results, and (d) selecting and outputting the stored data corresponding to each separated data.
According to the apparatus and method of the present invention, with one parallel processor, a picture processing apparatus that can process a picture whose size is twice as large as the number of element processors. Thus, since the picture processor apparatus can be composed of a small number of parallel processors, the structure of the entire picture processing apparatus including peripheral circuits becomes simple.
In addition, since the data memory portion and the ALU array portion are shared, in comparison with the structure using a plurality of parallel processors, the area of the parallel processor becomes narrow. In addition, it is not necessary to connect processors. Thus, the apparatus can be structured in a small circuit scale. Moreover, since the number of parts of the circuit is small, the cost of the apparatus can be reduced.