The present invention relates generally to a high-speed sorter, and more particularly to an optimized high-speed sorter which is a process element (PE) based structure, and can complete the deletion or insertion operation for a single input sample within one cycle.
At present, sorting plays an important role in many applications, such as data sorting, word processing, computer system design, signal processing, etc. Previously, the pertinent technologies mainly concentrated upon the software algorithm, such as the bubble sort, quick sort, and the like. If processing speed and data quantity are not important concerns, the sorting software perhaps can meet the user's requirements. However, since requirements for high-speed and large-quantity processing are increasing, the software solution no longer meets such requirements.
To solve this problem, several high-speed sorter approaches with hardware implementation have been proposed and developed. These high-speed sorter approaches have been mainly realized through the systolic array architecture, and the circuit designs which are well known in the art include the bubble sorter and ROS sorter (these designs could refer to J. Offen and R. Raymond, "VLSI Image Processing", McGraw-Hill, 1985; A. L. Fisher, "Systolic Algorithms for Running Order Statistics", in Signal and Image Processing, Dept. of Computer Science, Carnegie Mellon University, Pittsburgh, July 1981; and H. T. Kung, "Why Systolic Architectures", IEEE Computer, Vol. 15, no. 1, Jan., 1982). Though the bubble sorter is faster, and can process overlapping data, the number of its process elements, i.e. the size of the hardware implementation complexity, is proportional to the square of the number of input samples. In addition, the required values can only be obtained with a latency of N cycles, where N is the number of input samples. Although the hardware complexity of the ROS sorter linearly depends on the number of input samples (N), the latency remains the same as that needed in the bubble sorter. This latency of N cycles may not be allowed when real-time performance is concerned.