There are many situations where hardware is required to sort two or more input binary numbers, i.e. to arrange them in order of size. Such sorters are typically constructed from a number of identical logic blocks as shown in FIG. 1. FIG. 1 shows a schematic diagram of an example hardware arrangement 100 for sorting 4 inputs, x1, x2, x3, x4 into size order, i.e. such that output1≥output2≥output3≥output4. It can be seen that this sorter 100 comprises 5 identical logic blocks 102 each of which outputs the largest and smallest (i.e. max and min) values of two inputs (which may be denoted a and b).
Each of the logic blocks 102 receives two n-bit integer inputs (a, b) and comprises a comparator that returns a Boolean that indicates whether a>b. The output of the comparator, which may be referred to as the ‘select’ signal, is then used to control a plurality of n-bit wide multiplexers that each choose between n-bits from a or n-bits from b. If the logic block 102 outputs both the maximum and minimum values (from a and b, as shown in the examples in FIG. 1), the select signal is used to control the multiplexing of 2n-bits (e.g. in the form of 2n 1-bit wide multiplexers or two n-bit wide multiplexers). Alternatively, if the logic block has only one output (which is either the maximum or minimum of a and b), the select signal is used to control the multiplexing of n-bits (e.g. in the form of n 1-bit wide multiplexers or one n-bit wide multiplexer).
In the arrangement described above, the select signal is used to power a plurality of logic elements (e.g. logic gates) within a logic block 102 and this results in a large propagation delay. This effect of a delay is caused by a single gate output wire having to charge the transistors in a large number of gates (before these latter gates can propagate their outputs) is called ‘fanout’. Whilst this delay may be acceptable when only sorting two input numbers, where these logic blocks 102 are concatenated (e.g. as in the sorter 100 shown in FIG. 1 or larger sorters for more than 4 inputs) the resulting delay of the sorting circuit increases which may seriously impact performance (e.g. it may result in the sorting process taking more than a single clock cycle).
A solution to this delay is to include a large number of buffers (e.g. at least n buffers, which may be arranged in a tree structure) with each of the buffers being driven by the select signal; however, this results in a hardware arrangement that is significantly larger (e.g. in terms of area of logic).
The embodiments described below are provided by way of example only and are not limiting of implementations which solve any or all of the disadvantages of known sorters.