1. Field of the Invention
The present invention relates to digital filter processing devices, and more particularly, to a digital filter processing device that carries out an operation between data read out from a data memory and a filter coefficient.
2. Description of the Background Art
Parallel processing is particularly effective when high speed processing of a great amount of data such as in video signal processing is required. Among parallel processing oriented architecture, attention is particularly focused on data driven type architecture.
In a data driven information processing device, processing is carried out in parallel according to a rule that "a process is effected when all input data required for a certain process are provided and when resources such as an operation device required for that process are assigned".
An operation process associated with digital filter processing is frequently carried out in processing digital video signals in time series in a data driven information processing device.
FIG. 18 is a block diagram showing a structure of a video process oriented data driven information processing device adapted in conventional art and in an embodiment of the present invention.
Such a block structure of FIG. 18 is disclosed in the document of "Study on parallel processing system by dynamic data driven processor" (Microcomputer Architecture Symposium Nov. 12, 1991, Japanese Society of Information Processing Engineers of Japan).
FIG. 19 shows a structure of fields in a data packet applied in conventional art and in an embodiment of the present application.
Referring to FIG. 19, a data packet includes an instruction field F1 storing an operation code C, a generation field F2 for storing a generation number GN#, a first data field F3 for storing first data D1, a second data field F4 for storing second data D2, and a processor field F5 for storing a processor number Pe#.
The video processing oriented data driven information processing device of FIG. 18 includes a data driven processor 1 for video processing, and an image memory unit 15. Image memory unit 15 includes a memory interface 2 and an image memory 3.
Data driven processor 1 includes input ports IV, IA and IB connected to data transmission lines 5, 7 and 8, respectively, and output ports OV, OA and OB connected to data transmission lines 4, 9 and 10, respectively.
An input data packet including a generation number GN# assigned thereto in its input order is entered in time series to data driven processor 1 via input port IA or IB from data transmission lines 7 or 8, respectively. A preset process is stored as a program in processor 1. Processing is carried out according to the contents of the program.
Memory interface 2 receives from output port OV of processor 1 an access request to image memory 3 (reference/update of the contents of image memory 3) via data transmission line 4. Memory interface 2 effects access with respect to image memory 3 via memory access control line 6 according to the received access request. The result is provided to data driven processor 1 via data transmission line 5 and input port IV.
Upon completion of processing of a data packet, data driven processor 1 provides a data packet for signal output via output port OA and transmission line 9 or output port OB and transmission line 10.
FIG. 20 is a block diagram showing a structure of video process oriented data driven processor 1 applied in conventional art and in the embodiment of the present invention.
Data driven processor 1 includes an input processing unit 11, a junction unit 12, a main processing unit 13, a branch unit 14, a PE# register 16, and an output processing unit 17.
An identification number PE# for identifying each processor in a network system are stored in PE# register 16 when the network system includes a plurality of data driven processors 1.
Input processing unit 11 compares the processor number Pe# in a data packet applied via input port IA or IB with identification number PE# in register 16. Determination is made that the input data packet is addressed to the relevant processor when these numbers match each other, whereby that input packet is dispatched to junction unit 12. Determination is made that the input packet is addressed to another processor if the numbers do not match, whereby that input packet is dispatched to output processing unit 17.
Junction unit 12 sequentially receives a data packet sent from input processing unit 11 and a data packet sent from branch unit 14 to dispatch the same to main processing unit 13. Junction unit 12 also detects paired data to dispatch a data packet storing the paired data to main processing unit 13.
Main processing unit 13 carries out a process according to a data flow program stored therein. If access to image memory 3 is required, a data packet is dispatched to image memory unit 15 via output port OV. A data packet after image memory 3 is accessed is received via input port IV.
Similar to input processing unit 11, branch unit 14 compares processing number Pe# of the input packet with identification number PE# in register 16. When these two numbers match each other, the input packet is dispatched to junction unit 12, otherwise to output processing unit 17.
Output processing unit 17 refers to processor number Pe# in the input packet to dispatch that input packet to either output port OA or OB according to a preset branching condition that will be described hereinafter.
The branching condition is set in output processing unit 17 of each processor using an initialization packet prior to dispatch of a data packet to each processor.
A value for masking (referred to as "mask value" hereinafter) and a value for matching (referred to as "match value" hereinafter) are stored in output processing unit 17 in each processor by means of the initialization packet. In a normal process, output processing unit 17 carries out an AND operation between processor number Pe# of the input packet and the pre-stored mask value to compare the result with the prestored match value. When the resultant of the AND operation is identical to the match value, the input packet is dispatched to output port OA, otherwise to output port OB.
Operation code C in a data packet shown in FIG. 19 is an execution instruction regarding the contents of the process on image memory 3, for example a program including reference or update of the contents of image memory 3.
Generation number GN# is an identifier assigned thereto according to the input order in time series at the time of input to data driven processor 1 via data transmission line 7 or 8.
Generation number GN# is used in matching data for detecting paired data in junction unit 12 in data driven processor 1. In memory interface 2 of image memory unit 15, the address to be accessed in image memory 3 is determined according to generation number GN#.
First and second data D1 and D2 are data interpreted according to the contents of a corresponding operation code C. When operation code C indicates update, for example, of the contents in image memory 3, data D1 is the data to be written into image memory 3, and data D2 is disregarded. When operation code C indicates reference to the contents in image memory 3, data D1 and D2 are insignificant.
When a data packet is applied to image memory unit 15 via data transmission path 4, and image memory 3 is accessed according to the contents of the input packet, the accessed result is stored as first data D1 in first data field F3 of that data packet. Then, the data packet is dispatched via data transmission line 5.
The operation of carrying out a two-dimensional digital filter process on an input packet applied to data driven processor 1 in time series via input port IA or IB according to scanning operation of the frame in a video process oriented data driven information processing device will be described hereafter.
FIG. 21 is a flow chart schematically showing a two-dimensional digital filter process using a conventional video process oriented data driven information processing device.
FIG. 22 shows the contents of the region in a memory cell subjected to a two-dimensional digital filter process using a conventional video process oriented data driven information processing device.
FIGS. 23A-23D schematically show the procedure of a two-dimensional digital filter process using a conventional video process oriented data driven information process.
FIG. 24 shows an example of a frame of m.times.n pixels.
FIG. 25 shows an example of a 2.times.2 two-dimensional filter coefficient.
FIG. 26 shows a stored state in image memory 3 as a result of a two-dimensional digital filter process on frame data using the filter coefficient of FIG. 25 and m.times.n frame data of FIG. 24.
A case is considered in carrying out the 2.times.2 two-dimensional filter process shown in FIG. 25 on a frame of m.times.n pixels shown in FIG. 24. .alpha.00, .alpha.01, .alpha.10 and .alpha.11 shown in FIG. 25 are arbitrary filter coefficients.
In a video process oriented data driven information processing device, one frame of image data, i.e. data (1, 1), data (1, 2), . . . , data (1, n), data (2, 1), data (2, 2), . . . , data (2, n), . . . , data (m, n) input in a time series manner is stored in image memory 3 of image memory unit 15. A two-dimensional filter process is carried out thereon to output the result.
FIG. 22 shows a region of image memory 3 for storing data associated with this two-dimensional filter process.
Image memory 3 includes a region Ea for temporarily storing data of a packet applied to the video process oriented data driven information processing device, a region Eb for storing temporarily respective multiplied results as intermediate results, and a region Ec for accumulating the multiplied results.
At step S100 in FIG. 21, data (all data of one frame) stored in a data packet applied via input port IA or IB is stored in region Ea of image memory 3 via main processing unit 13 and output port OV.
At step S101, main processing unit 13 determines whether all the multiplication of a data value within a range of interest in region Ea and a corresponding filter coefficient is completed or not.
Each multiplication is carried out at steps S102 and S103 between a relevant data value (data (1, 1), data (1, 2), data (2, 1) and data (2, 2) in FIG. 23A) and a corresponding filter coefficient (60 00, .alpha.01, .alpha.10 and .alpha.11 of FIG. 23A).
Main processing unit 13 provides the data packet to image memory unit 15 via output port OV. The data value of interest is read out from memory 3 to be stored into that data packet and applied to main processing unit 13. Main processing unit 13 receives the data packet and reads out the filter coefficient to carry out the above-described multiplication.
At step S104, the data packet storing the multiplied result is provided to image memory unit 15 via output port OV. The multiplied result is written into region Eb shown in FIG. 22. After this writing operation, that data packet is returned to main processing unit 13.
The above steps S102-S104 are carried out, i.e. repeated for 4 times, until determination is made that the multiplication operation is completed in step S101.
At step S105, determination is made whether the adding operation within the range of interest of the two-dimensional filter is completed or not in main processing unit 13.
Each adding operation is carried out in steps S106 and S107. More specifically, the multiplied result stored in region Eb as an intermediate result is read out from image memory 3 and sequentially accumulated into region Ec that stores the final result corresponding to the range of interest.
Therefore, main processing unit 13 provides a data packet to image memory unit 15 via output port OV, whereby data of interest (multiplied result) is read out from image memory 3. The readout data is accumulated and written into the contents of a corresponding address in region Ec of image memory 3 in image memory unit 15. Then, the data packet is returned to main processing unit 13.
The process of steps S106 and S107 is carried out until determination is made that the adding operation (accumulation) is completed at step S105. In other words, the process of steps S106 and S107 is repeated four times.
At the end of the adding operation, the data packet at the final (fourth) accumulation is provided to image memory unit 15 via output port OV at step S108. Thus, the result of the two-dimensional filter after accumulation is read out.
At step S109, the data packet storing the result of the two-dimensional filter is returned to main processing unit 13 and provided from output port OA of OB. At main processing unit 13, determination is made whether the two-dimensional filter process for one frame is completed or not.
If the two-dimensional filter process for one frame is not yet completed, a similar two-dimensional filter process is initiated at step S110 for the next range of the two-dimension filter process.
When a two-dimensional filter process for one frame is completed, a similar two-dimensional filter process is repeated for the next frame at step S111. Since data applied to the processor is of one frame, the two-dimensional filter process for one frame is completed when determination is made of completion of computation at step S109.
When the two-dimensional filter process on data (1, 1)-data (2, 2) shown in FIG. 23A ends, the range of 2.times.2 is shifted rightwards by one column, and a two-dimensional filter process is repeated for the range of data (1, 2), data (1, 3), data (2, 2) and data (2, 3), i.e. data (1, 2)-data (2, 3), as shown in FIG. 23B.
When the range of 2.times.2 data is shifted rightwards and arrives at the right end of one frame data as shown in FIG. 32C, the range of 2.times.2 data is then shifted downward by one row to repeat the computation by being shifted from the left end to the right end by one column as shown in FIG. 23D.
The stored results of the two-dimensional filter process on the m.times.n frame data is shown in FIG. 26 as the shaded portion. This shaded portion is included in region Ec of FIG. 22.
Since the result of the two-dimensional filter process on data (1, 1)-data (2, 2) of the range of interest shown in FIG. 23A is 2DF(2, 2)=.alpha.00.times.data (1, 1)!+.alpha.01.times.data (1, 2)!+.alpha.10.times.data (2, 1)!+.alpha.11.times.data (2, 2)!, a memory region the multiplied result by each filter coefficient in ! is required in image memory 3, as well as a memory region Ec for storing the accumulation of each multiplied result.
The art of the above-described two-dimensional filter process is disposed in "Two-dimensional Digital Filtering" (Proceedings of the IEEE, Vol. 63, No. 4, April 1975 pp. 610-623).
The technique disclosed in this document employs the distributive law in a two-dimensional filter process using a filter coefficient of symmetry.
The symmetry of a coefficient filter will be described hereinafter.
FIGS. 27A-27D are diagrams for describing the symmetry of a one-dimensional filter coefficient.
FIG. 28 is a diagram for describing the symmetry of a two-dimensional filter coefficients.
The filter coefficients of FIG. 27A are symmetrical, implying that .alpha.0=.alpha.1, and similarly, .alpha.0=.alpha.2 in FIG. 27B, .alpha.1=.alpha.2 and .alpha.0=.alpha.3 in FIG. 27C, and .alpha.1=.alpha.3 and .alpha.0=.alpha.4 in FIG. 27D.
In the two-dimensional filter coefficients shown in FIG. 28, the filter coefficients in the pixel direction are .alpha.00, .alpha.01, . . . , .alpha.0v, and the filter coefficients in the line direction are .alpha.00, .alpha.10, . . . , .alpha.u0.
Symmetrical filter coefficients in the pixel direction establish .alpha.00=.alpha.0v, .alpha.01=.alpha.0(v-1). Similarly, symmetrical filter coefficients in the line direction establish .alpha.00=.alpha.u0, .alpha.10=.alpha.(u-1) 0.
Therefore, symmetrical filter coefficients in the pixel direction and the line direction yield .alpha.00=.alpha.0v=.alpha.u0=.alpha.uv, .alpha.01=.alpha.0(v-1)=.alpha.u1=.alpha.u(v-1), . . . .
The technique disclosed in the above-mentioned document has an advantage that the operation time is reduced since the distributive law is employed. However, this technique is implemented on the basis that all the frame data required for a digital filter process are provided. Therefore, there is no disclosure as to the real time process of input data.
Japanese Patent Publication No. 6-46412 discloses a data processor that improves operation performance.
The process carried out by this processor is limited to a convolution process that multiplies a plurality of data on an external memory respectively by a certain coefficient and taking the sum of the products. There is no disclosure of processing data applied in real time from the outside world.
Furthermore, since the process disclosed in the publication requires that all the data to be processed is prestored in a memory, this processing system is not applicable to the operation of processing data in real time applied from the outside world.
The above-described two-dimensional digital filter process according to FIG. 21 is disadvantageous in that usage of image memory 3 is not efficient, and the number of unnecessary accesses to image memory 3 is great.
More specifically, it is necessary to provide a memory region for temporarily storing all input image data (region Ea of FIG. 22), a memory region for temporarily storing the multiplication result between each image data and a filter coefficient (region Eb of FIG. 22), and a memory region for storing the accumulated value of the multiplied results (region Ec of FIG. 22) in image memory 3.
Furthermore, a greater number of data in one frame and a greater number of coefficients in a two-dimensional filter causes increase in the size of the memory for storing temporarily input image data and the processed intermediate results.
In carrying out one instruction in video process oriented data driven processor 1, a data packet circulates the internal pipeline of data driven processor 1. More specifically, the data packet travels from main processing unit 13.fwdarw.branch unit 14.fwdarw.junction unit 12.fwdarw.main processing unit 13 during execution of one instruction. Similarly, a data packet returns to data driven processor 1 via memory interface 2 to continue a similar process when image memory 3 is accessed by data driven processor 1. Here, an off-chip data transfer (external of processor 1) associated with the memory access operation should be minimized in high speed signal processing since it is slow in comparison to on-chip (internal of process 1) data transfer.
Conventionally, off-chip data transfer is required to access image memory 3 from data driven processor 1. This becomes a bottleneck in improving the transfer rate of a data packet. Therefore, the processing speed cannot be increased.
The access from processor 1 to image memory 3 becomes more frequent in proportion to increase the number of data in one frame and the number of filter coefficients for a two-dimension al digital filter process. It was therefore difficult to improve the process capability.
All the input data to be processed must be once stored in the process disclosed in the aforementioned "Two Dimensional Digital Filtering" and in the convolution process disclosed in Japanese Patent Publication No. 6-46412. There was a disadvantage that the number of data that can be input depends upon the memory size of the processor. Furthermore, it is not suitable for real time processing of input data.