The present invention relates to a data shuffler, that is, a circuit receiving a train of data and supplying at the output the same data arranged in a different and predetermined order.
Numerous applications, especially in the field of picture processing, require the use of operators or circuits capable of changing the order of a series of data according to a predetermined sequence.
An example of such an application is the calculation of the two-dimensional transform, such as discrete cosine transform, of an image block in a TV image where this shuffling problem occurs several times:
data shuffling for calculating the monodimensional transform,
matrix transposition,
conversion of the coefficient matrix scanning into a zigzag scanning.
Another application is the ciphering of a video signal through pixel mixing.
Hereinafter the state of the art and the invention will be described in the particular case it is desired to carry out zigzag scanning of a data block. However, it will be noted that it is only a specific case of the invention which is described in detail to make the invention better understood.
FIGS. 1A and 1B illustrate a zigzag scanning process. Considering a series of data, for example 16 data such as shown in FIG. 1A in the form of a 4.times.4 image block, zigzag scanning consists in reading those data according to successive diagonals, that is, as illustrated in FIG. 1B, in the following order: EQU 1, 2, 5, 9, 6, 3, 4, 7, 10, 13, 14, 11, 8, 12, 15, 16.
The most conventional method for carrying out such a data shuffling consists in using a circuit of the type illustrated in FIG. 2. This circuit comprises two random access memories (RAM) M1 and M2 each of which having the size of the train or block of data that it is desired to shuffle. Each of those memories is addressable either in the natural order of FIG. 1A through a counter C or in the order corresponding to a stored sequence in a ROM 10, the content of which corresponds to the drawing of FIG. 1B for a zigzag scanning.
Each train or block of data arriving on an input 11 is written in one of the memories according to natural order while the other memory is read according to zigzag order towards an output 12. During the next step, the data are entered in the memory which has just been read while the memory in which the data have just been written is read.
This method requires a memory capacity of two data trains or blocks in the RAMs M1 and M2 and a sequence of address words in ROM 10.
By way of example, considering 4.times.4 blocks and 12bits words, the memory capacity of the RAMs will have to be 32 words of 12 bits (384 bits) and the memory capacity of ROM 10 will be 16 words of 4 bits (64 bits), that is, a total memory capacity of 448 bits.
In case of 64-words blocks (8.times.8), it is necessary to provide for a RAM of 128 (2.times.64) words of 12 bits (1536 bits) and a ROM having a capacity of 64 words of 6 bits (384 bits), that is, a total memory capacity of 1920 bits.
In order to reduce the required memory capacity, it has been devised to use the simplified diagram illustrated in FIG. 3 comprising one RAM memory MO, the memory capacity of which is equal to the size of the incoming data block, that it is desired to transform, receiving the data trains on an input 11 and supplying them in a determined order on an output 12. This memory MO is addressed by a ROM 13 controlled by a counter C.
Initially, memory MO is filled with the first block of data normally arranged according to the order from 1 to 16. For the next data block, the ROM determines the addressing according to a determined order. Each time a datum is read at a specified address, a datum from the incoming train is simultaneously written at the place that has just been read. This architecture permits to divide by two the RAM memory capacity and to simplify the address circuits since the reading/writing operations are sequentially carried out in the same case without readdressing. However, it is necessary to provide for an increased capacity of ROM 13 for supplying the sequential addresses required for processing the successive data blocks. This structure has been used when the data shuffling corresponds to a matrix transposition (symmetry with respect to a diagonal). Indeed, this operation is involutive, that is, after two transpositions the initial order occurs again. Therefore, it is only necessary to store one address sequence in ROM 13.
But, the operation is more complex in case of zigzag scanning as illustrated in FIGS. 4A-4F for 4.times.4 data blocks. The first address sequence, corresponding to natural order, is illustrated in FIG. 4A. The next address sequence is illustrated in FIG. 4B where it can be appreciated that the third address corresponds to case 5 and the fourth one to case 9. The table of FIG. 4B is obtained by following the arrows represented in FIG. 4A. For the next data block, following the same zigzag sequence as the one which permits to pass from FIG. 4A to FIG. 4B and by applying this sequence to FIG. 4B, it can be seen that the successive order will be 1, 2, 6, 10 . . . (FIG. 4C). One similarly passes from each figure to the next one. Considering FIG. 4F, it will be seen that by applying a zigzag scanning to the data that are stored therein, the natural sequence of FIG. 4A is obtained again. Therefore, it is necessary to provide for six address cycles in the ROM to be able to convert again, according to a zigzag order, the successive data trains.
Considering the same digital data as above, for 4.times.4 blocks of 12-bits data, it will therefore be necessary to provide for a 16-words RAM capacity of 12 bits (192 bits) and a 4-bits ROM capacity of 6.times.16 address words (384 bits), that is, a total memory capacity of 576 bits.
In case of a zigzag scanning of 8.times.8 blocks, it will be necessary to carry out one after the other 136 different sequences before obtaining again the initial sequence which requires a storage ROM of 8704 (136.times.64) 6-bits words, that is, 51 kbits, to which are to be added the 64 words of 12 bits of the RAM. Therefore, it is clear that in this case the initial two-RAMs approach such as illustrated in FIG. 2 is far more economical than the one-RAM approach such as illustrated in FIG. 3.
Moreover, in the very schematic FIGS. 2 and 3, various decoders, which are necessarily associated with the storage ROM and are not represented, also necessitate a non negligible silicon surface area.
Another drawback of the prior art data shufflers is that the manufacture of memory structures is unavoidably delicate and involves to provide for sorting and testing steps to check that none of the memory points is faulty.
A further drawbacks of the prior art structures is that the process for using the described shufflers requires initial filling of a full memory block. Therefore, the latency time is equal to the duration of the initial filling of a complete memory, that is, to the duration for introducing all the words of a data train.
An object of the invention is to provide for a data shuffler palliating the drawbacks of the two above prior art circuits and more specifically:
decreasing the necessary silicon surface,
increasing the reliability of the system,
reducing latency time.