1. Field of the Invention
This invention relates generally to a data processing circuit and method for determining locations of a predetermined value in a sequence of data bits.
2. Description of the Prior Art
In data processing there are many circumstances in which it is helpful to know the locations of a predetermined value in a sequence of data bits. For example, the number of leading zeros in a sequence of data bits needs to be determined in many floating point implementations (see for example U.S. Pat. No. 5,040,138). Further, knowing the number of leading zeros, or in other words the position of the first logic one value, in an operand can increase the speed of arithmetic operations performed on the operand. U.S. Pat. No. 5,111,415 discloses an asynchronous leading zero counter for calculating the number of leading zeros of an operand in order to increase arithmetic processing speed. It comprises a plurality of leading zero detector cells of like kind arranged in an array that provides a digital output word having a magnitude indicative of the leading zero count (and thus the position of the first logic one value) on the inputs to the plurality of cells.
Similarly, in an ARM processor (as designed by ARM Limited), a find first one and find next one is a fundamental part of implementing LDM instructions (block data loads from memory) and STM instructions (block data stores to memory). These vector instructions operate with a register list, which is a 16 bit field in the instruction in which each bit corresponds to a register. A logic 1 value in bit 0 of the register field indicates that register R0 is to be transferred, while a logic 0 value indicates that it is not to be transferred; similarly bit 1 controls the transfer of register R1 and so on. Thus, to implement these instructions it is necessary to perform a very fast find first 1, and sometimes find second 1, on up to 16 bits.
A conventional way of determining find first one followed by find next one for block loads and stores is illustrated in FIGS. 1 and 2. The sequence of data bits in which the first one is to be detected is referred to herein as a xe2x80x9cvectorxe2x80x9d, and arrives in the instruction pipeline 10 at the start of the decode cycle. The location of the first one in the vector is needed by the time register reads occur. FIG. 1 shows an instruction pipeline 10 with a multiplexer 20 connected to the instruction pipeline 10 and a find first 1 circuit 30 connected to the output of the multiplexer 20. The vector in which the first one is to be detected travels through the latches 12 of the instruction pipeline 10 and during one pipelined stage (in preferred embodiments the decode stage) is taken via the multiplexer 20 to the find-first-one circuit 30. This circuit finds the first one in the vector and outputs the location as a 4 bit binary number to register bank 40 to identify a particular register in the register bank. Further, the find first one circuit 30 is arranged to mask the first one in order to generate a revised vector, and to return this revised vector to the multiplexer 20. The instruction containing the vector will also include a base address, which is passed to the address adder 24, from where it is output to the memory 50. Hence, this address will identify the memory location in the memory 50 whose contents are to be loaded into the particular register identified by the find first one circuit 30 in the case of an LDM instruction, or will identify the address to which the contents of the register specified by the find first one circuit 30 are to be stored in the case of an STM instruction.
The address adder 24 is also arranged to receive as an input the output of a circuit 22 provided to count the number of logic one values in the original vector. This enables the adder to calculate the memory addresses from which data is to be loaded or to which data is to be stored. During the next iteration of the process, the multiplexer 20 is arranged to pass the revised vector to the find first one circuit 30, such that the location of the next one is output to register bank 40 to identify a corresponding register in the register bank. Further, on this iteration, the address adder 24 is arranged to increment the base address and to provide the incremented address to the memory 50. Accordingly, in this next iteration, the next register is identified, and the next memory location is also identified, thereby enabling the load or store process to be repeated based on the new register and new memory address. This sequence is repeated several times until all logic one values have been found in the vector, and accordingly all load or store operations have been performed.
The find first one circuit, giving an example vector, is shown in greater detail in FIG. 2, in which like parts have like reference numerals. In the example shown a vector 11101100 is input to the find first one circuit 30. The location of the first one (bit 2 in this case) which specifies the first register to be transferred is then output to register bank 40. The find first one circuit 30 also acts to mask the first one that has been found in the original vector and outputs to the latch 34 a revised vector 11101000. This revised vector with the first one masked is then re-input in the next cycle via the multiplexer 20 into the find-first-one circuit 30 and the first one in this vector is then found, this being in effect the second one of the original vector. This step is then repeated until all ones are found. Thus, in each cycle the find-first-one circuit 30 operates on the output vector from the previous cycle.
An example, showing an 8 bit vector for clarity is shown below, where bit 0 is the first bit and bit 7 the last.
Bit position 76543210
for vector 0 0 1 1 0 1 0 1
As is shown above, vector 00110101 is input to the pipeline at the start of decode. The location of the first logic one value at bit position 0 is determined and this location along with a revised vector, being the original vector with the first one (in bit 0) masked, i.e. vector 00110100, is output in a single clock cycle. In the next clock cycle the vector output from the previous calculation i.e. 00110100 is input to the find first one circuit and the position of the first logic one value in this vector is found, i.e. 2. This result is then output along with a further revised vector, being the revised vector with the first one masked, this vector being used as the input vector in the next clock cycle. Thus, in this example, at the end of four clock cycles all of the logic one values have been found and their positions output.
This works well provided that you only need to do one find first one per cycle. However, to increase processing speed, it may be desirable to execute instructions which require two logic one value values to be determined in a single cycle. For example, it would be desirable to execute LDM instructions, which can load two registers in a single cycle. Thus, find first one and find second one would need to be performed in one cycle. FIG. 3 illustrates a circuit for finding first and next one, while FIG. 4 shows a flow diagram of such a circuit.
FIG. 3 is very similar to FIG. 2, except that there are two identical find first one circuits 30, 32 arranged in series. The original input vector passes from the instruction pipeline 10 through multiplexer 20 to the first find one circuit 30. The first one in the input vector is found and its location output, this logic one value is then masked and a revised vector with the logic one value masked is sent to the next find first one circuit 32. This circuit finds the first logic one value in the revised vector (which corresponds to the second logic one value in the original vector), outputs its location and then masks this logic one value, and outputs a further revised vector via a register 34 back to the multiplexer 20. Thus, provided the circuitry is able to operate quickly enough, it might be possible to use this circuit to find two logic one values in a single clock cycle.
A flow diagram showing the conventional find first and second one implementation of FIG. 3 is given in FIG. 4. In block 100 the first one is found, and block 110 then determines if this is the last bit in the data sequence or not. If it is then the process finishes, if not it proceeds to block 120, wherein the one that has been found is masked or cleared. In block 130 the first one in the masked vector is found, which in effect is the second one in the original vector. Block 140 determines if this is the last bit or not and if it is the process finishes. If not then this bit is masked or cleared at block 150 and a revised vector is output. This vector is then returned to step 100, where the process is repeated.
An example showing how the circuit and flow diagram work for a given vector, 00110101 is shown below.
As can be seen, 00110101 is input to the find first one circuit 30 in clock cycle one (block 100 of FIG. 4). The first logic one value in this vector is located and its position (bit zero) is output along with a revised vector 00110100, which is the original vector with the logic one value at bit zero being masked (block 120). This is input to the second find first one circuit 32, which locates the first logic one value in this vector (block 130) and output its location, bit position 2, along with a revised vector (block 150) having this logic one value masked. Thus, in the first cycle the first two logic one values in the original vector are located and output, along with a revised vector having these two logic one values masked. This revised vector is then input into the circuit again and the position of the next two logic one values is located in the next cycle, and so on until no further logic one values are found (block 110 or 140).
The drawback of this circuit is that the two find first one circuits must operate in series and hence two logic one values are found one after the other, the output of the first find one circuit 30 being required before the second find first one circuit 32 can operate. Thus, in order to complete these two operations in a single cycle these circuits need to be made to run very fast or the cycle length needs to be lengthened.
One possible way of addressing the problem of finding more than one location of a predetermined value in a sequence of data bits in one clock cycle, would be to build a table identifying locations in the sequence of data bits of that predetermined value. This table could then be indexed and the positions of the predetermined values read out very quickly. FIG. 5 shows a flow diagram illustrating this idea where the predetermined value is a logic one value. The table is first constructed in block 200, and is then indexed in the following blocks. Starting, for example, at i=0 the first entry in the table which gives the location of the first one is read at block 210. Block 220 checks that i does not correspond to the end of the table. If it does the routine finishes, if not i is incremented at block 230 and the i+1th position of the table is read at block 240. Block 250 checks that incremented i does not correspond to the end of the table. If it does the routine finishes, if not, i is incremented again at block 260 and the process continues from the start of the flow diagram in the next cycle. FIGS. 6A-6C show examples of what the tables would look like for three different vectors, FIG. 6A being the table for the vector 00110101, FIG. 6B being the table for the vector 10000000, and FIG. 6C being the table for the vector 11111111. As can be seen, the table has an entry for each possible location of the logic 1 value and is filled from position zero up as ones are found.
As is clear from the above, once the table is built it is indeed very quick and easy to index and read from the table. Unfortunately, the building of the table takes a lot of time, the building of the table simply being an extension of the find first one followed by find next one for each entry in the table one after the other. Thus, the provision of a table that can be indexed would not seem to overcome the problem of the prior art, since with the above apparatus it takes too long to generate the table.
Viewed from one aspect, the present invention provides a data processing circuit for determining locations of a predetermined value in a sequence of data bits comprising: a first store for receiving said sequence of data bits; and an analyser operable to determine a first location of said predetermined value nearest a first end of said sequence of data bits and to store in a second store a location indicator identifying said first location; for each of a number of potential locations of said predetermined value in said sequence of data bits, said analyser further being operable to: (i) identify a next location of said predetermined value further from said first end of said sequence than said potential location; and (ii) store in a third store in association with said potential location a location indicator for said next location.
As each step for identifying location of the predetermined values is not dependent on the outcome of any other step, then each can be performed independently of the others. This provides potential for decreasing the time required for completing the steps.
Preferably, the analyser is operable to perform steps (i) and (ii) in parallel for each potential location. Performing the steps in parallel means that in the time required to locate a single predetermined value, a plurality of predetermined values can be determined. This decreases the time required to locate a plurality of predetermined values.
Advantageously, the data processing circuit further comprises a reader operable to: (a) read said first location indicator in order to identify said first location of said predetermined value; (b) determine the potential location corresponding to said first location; (c) read the location indicator associated with said potential location in order to determine a next location of said predetermined value.
The storing of a next location indicator for each potential location may lead to some redundancy, for example in the case of 10010000 where the first location of a one is at location 4, and this same information is also recorded associated with locations 0, 1, 2 and 3. This is the cost of finding the locations in parallel. However, by stepping through the data in the way described above, the redundant data is missed in the read out stage. In the above example, the location indicator associated with the first location would be 4, thus the location indicator associated with location 4 would be read next and location indicators stored at locations 0, 1, 2 and 3 would be skipped. This is a very efficient way to access the stored data.
Preferably, the reader is further arranged to d) determine the potential location corresponding to said next location; (e) read the location indicator associated with said potential location determined at step (d) in order to determine a next location of said predetermined value; (f) repeat steps (d) and (e).
In preferred embodiments the reader is operable to repeat steps (d) and (e) until detection of said potential location comprises an end indicator. The use of an end indicator improves the efficiency of the process by stopping it from continuing once the predetermined value furthest from the first end has been found.
Indexing the information in the above manner is an efficient way of accessing all the relevant information.
In preferred embodiments, the analyser comprises a number of value locating circuits, each value locating circuit being arranged to determine a location of said predetermined value. Advantageously, the analyser comprises a value locating circuit corresponding to each of said number of potential locations, and a value locating circuit for determining said first location, preferably the value locating circuits being arranged in parallel. By having a number of value locating circuit, each arranged to determine a particular location of a predetermined value, the circuits can operate for their particular locations independently of each other. By arranging the circuits in parallel they can operate simultaneously.
Although in some embodiments the first location indicator of the predetermined value is stored separately to the other location indicators for other instances of said predetermined value, in other embodiments it is stored alongside them in the same storage means. This storage means may be located locally within the data processing circuit or it may be located separately to it, the data processing circuit transferring data to this external storage means.
Preferably, the analyser is operable to write said potential locations and said location indicators to said third store in the form of a table,s each entry in the table comprising potential locations and associated next location indicators. This is a convenient and easy way to store and access the data.
In preferred embodiments, the value locating circuits are synchronous circuits and said data processing circuit further comprises a clock for clocking said synchronous circuits. This data processing circuit lends itself particularly well to synchronous circuits, whereby owing to the independent nature of the steps, the analyser may identify and write said locations to said third data store during one cycle of said clock.
Although the data processing circuit may find the location of any predetermined value, in preferred embodiments it finds logic one value values. There are many applications where the positions of logic one values are required to be known and the data processing circuit of an embodiment of the present invention is particularly well adapted at locating them.
Although the data processing circuit of embodiments of the present invention may be arranged to find locations from either end of a data sequence, preferably it finds occurrences of the predetermined value starting from the end of the sequence representing the least significant bit.
Preferably, upon determination of no further occurrences of said predetermined value in said sequence of data bits, said analyser is operable to generate an end indicator and to store said end indicator as said location indicator in said third store in association with the corresponding potential location.
By recording an end indicator once the location for the predetermined value furthest from the first end has been found, further steps to look for further values may be avoided and thus the efficiency of the data processing circuit improved.
In preferred embodiments, said location indicators comprise a string of bits and said end indicator comprises a string of zeros of the same length as said string of bits.
The end indicator can take any form provided that the data processing circuit is adapted to recognise it. As it is stored in the position of a location indicator, it preferably has the same form as a location indicator. In preferred embodiments the location indicators comprise four bit numbers. In this case, therefore any four bit number that is not used as a location indicator in the third store would be appropriate as an end indicator. Where the first end is the least significant bit, the only location indicator that may reference the location of the least significant bit (bit zero) is the one for the first location, which in many embodiments is stored separately to the other location indicators. Thus, in these embodiments the location indicator for bit zero (e.g. 0000) could not occur in the table of the other locations, and it is therefore a good choice as an end indicator. Clearly other values that could not occur could also be used.
In preferred embodiments, the sequence of data bits are embedded in a microprocessor instruction, for example a microprocessor LOAD or STORE instruction.
The data processing circuit of the present invention is particularly well adapted for block loads from memory and block stores to memory, wherein it is necessary to find the location of a plurality of logic one values in a sequence of data bits in as short a time as possible.
Viewed from a second aspect the present invention provides a method of determining locations of a predetermined value in a sequence of data bits comprising the steps of: (a) determining a first location of said predetermined value nearest a first end of said sequence of data bits; (b) storing a first location indicator identifying said first location; (c) identifying potential locations for said predetermined value in said sequence of data bits; (d) for each identified potential location (i) determining the next location of said predetermined value further from said first end of said sequence than said potential location; (ii) storing in association with said potential location a location indicator identifying said next location.
The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.