1. Field of the Invention
The present invention relates to signal processing, and, in particular, to bit-level processing of a binary vector.
2. Description of the Related Art
FIG. 1 shows a simplified block diagram of one implementation of a prior-art apparatus 100 for generating index values for bits of a binary vector that have a value of 1. Apparatus 100 has vector memory 102, comprising n addresses, each of which stores one bit of a binary vector u. As used in this specification, the term “address” refers to an individual storage location within memory. During each clock cycle, vector memory address counter 106 generates an index value corresponding to one of the addresses of vector memory 102 (i.e., one of the n bits of binary vector u). The index values, which (i) begin at 0, (ii) increment by 1 during each clock cycle, and (iii) end at n−1 (i.e., range from 0, . . . , n−1), may be represented in binary form, where at least log2(n) bits are used to represent each of the n index values.
Each index value is provided to write port 112 of index memory 110 and is written to index memory 110 only if the corresponding bit of binary vector u has a value of 1. This is accomplished by supplying the corresponding bit from read port 104 of index memory 102 to both (i) write-enable port 114 of index memory 110 and (ii) index memory counter 108. If the corresponding bit value is 1, then write-enable port 114 enables write port 112 so that the index value may be written to an address of index memory 110 that is supplied by index memory counter 108. If the corresponding bit value is 0, then write enable port 114 does not enable write port 112 and the index value is not written to index memory 110.
Index memory 110 comprises a number wmax of addresses, where wmax is the maximum possible hamming weight of binary vector u (i.e., the maximum number of index values that may be stored in index memory 110). Each address is capable of storing one index value that comprises at least log2(n) bits, and thus, the total size of index memory 110 is equal to at least wmax addresses×log2(n) bits per address. Index memory counter 108 generates an index memory address for each index value that is written to index memory 110 (i.e., that corresponds to a bit having a value of 1). The index memory addresses begin at 0, increment by 1 every time index memory counter 108 receives a bit value of 1, and end at wmax−1 (i.e., range from 0, . . . , wmax−1). After the index values are stored in index memory 110 for all bits having a value of 1, the index values may be output through read port 116, one index value at a time. To further illustrate the operation of apparatus 100, consider FIGS. 2 and 3.
FIG. 2 shows Table I, which illustrates (i) an exemplary binary vector u having n=256 bits and (ii) index values that may be generated by apparatus 100 for each of the 256 bits. As shown, vector memory address counter 106 generates a first index value of 0 for the first bit, a second index value of 1 for the second bit, a third index value of 2 for the third bit, a fourth index value of 3 for the fourth bit, a fifth index value of 4 for the fifth bit, and so on. The index values, although shown in decimal units, are provided to index memory 110 as binary numbers, where at least 8 bits (i.e., log2(256)) are used to represent each of the 256 index values.
FIG. 3 shows Table II, which illustrates how the index values from FIG. 2 may be stored in index memory 110 of FIG. 1. In this example, suppose that the maximum possible hamming weight wmax is 64, and thus, the total size of index memory 110 is 512 bits (i.e., wmax×log2(n)=64×log2(256)). During the first clock cycle (i.e., 0), the first bit from FIG. 2 has a value of 0, so write port 112 of index memory 110 is not enabled and the first index value (i.e., 0) is not stored in index memory 110. During the second clock cycle, the second bit has a value of 1, so write port 112 is enabled and the second index value (i.e., 1) is stored in the first address (i.e., 0) of index memory 110 as shown in FIG. 3. During the third and fourth clock cycles (i.e., 2 and 3), the third and fourth bits have a value of 0, so the third and fourth index values (i.e., 2 and 3) are not stored in index memory 110. During the fifth clock cycle (i.e., 4), the fifth bit has a value of 1; thus, index memory counter 108 increments by 1 so that the fifth index value (i.e., 4) is stored in the second address (i.e., 1) of index memory 110 as shown in FIG. 3. The next two index values written to index memory 110 are index values 7 and 9, and these values are stored in the third and fourth addresses (i.e., addresses 2 and 3) of index memory 110. This process is repeated until index values have been generated for all bits of binary vector u having a value of 1. Here, as shown in FIG. 2, the last bit having a value of 1 is the 255th bit, for which an index value of 254 is generated. Index value 254 is written to the last address (i.e., 63) of index memory 110 as shown in FIG. 3.
Apparatus 100 is capable of determining index values for all bits of a binary vector u that have a value of 1 in as few as wmax clock cycles or in as many as n clock cycles, depending on the arrangement of bits in the binary vector. For example, in the example above, where wmax equals 64, if the first 64 bits of binary vector u had a value of 1, then apparatus 100 could determine all 64 index values corresponding to the first 64 bits in the first 64 clock cycles. On the other hand, if the last bit (i.e., the 256th bit) of binary vector u had a value of 1, then apparatus 100 would not determine the 64th index value until the 256th clock cycle. To reduce the number of clock cycles performed by apparatus 100, a semi-parallel architecture may be implemented that considers more than one bit during each clock cycle.
FIG. 4 shows a simplified block diagram of one implementation of an apparatus 400 that employs a semi-parallel architecture to determine index values for bits of a binary vector that have a value of 1. Apparatus 400 has a parallelization factor of M, indicating that it generates index values for a number M of bits during each clock cycle. Apparatus 400 comprises vector memory 402, which is similar to vector memory 102 of FIG. 1, is capable of storing a binary vector u having n total bits. However, unlike vector memory 102, which comprises n addresses and stores only one bit per address, vector memory 402 comprises ceil(n/M) addresses, and each address stores a sub-vector s of binary vector u that comprises M bits. The ceiling function ceil(n/M)) represents the integer value that is equal to or just exceeds n/M.
During each clock cycle, vector memory address counter 406 generates an address corresponding to one of sub-vectors s stored in vector memory 402. The addresses, which (i) begin at 0, (ii) increment by 1 during each clock cycle, and (iii) end at ceil(n/M)−1, may be represented in binary form, where the number of bits used to represent each of the addresses is at least log2(ceil(n/M)). Each address generated is provided to M computation blocks 418(0), . . . , 418(M−1), which calculate M index values based on each address using Equation (1) below:index value=(bc×M)+m,  (1)where bc represents the vector memory address provided by vector memory address counter 406. Each of the index values corresponds to one of the bits m of a sub-vector s of binary vector u where m=0, . . . , M−1. The index values may be represented in binary form, where the number of bits used to represent each index value is at least log2(n). To further understand the generation of index values in relation to apparatus 400, suppose that the parallelization factor M of apparatus 400 is equal to 8 and that vector memory 402 is capable of storing n=256 bits.
FIG. 5 shows Table III, which illustrates an exemplary binary vector u that is divided into 32 sub-vectors s ranging from 0, . . . , 31, where each sub-vector s comprises eight bits. The first eight bits 0, 1, 0, 0, 1, 0, 0, and 1 (i.e., sub-vector 0) of binary vector u may be stored in the first address (i.e., 0) of vector memory 402, the second eight bits 0, 1, 0, 1, 0, 0, 1, and 0 (i.e., sub-vector 1) may be stored in the second address (i.e., 1) of vector memory 402, the third eight bits 0, 1, 1, 0, 0, 1, 0, and 0 (i.e., sub-vector 2) may be stored in the third address (i.e., 2) of vector memory 402, and so on. To store all 32 sub-vectors, vector memory 402 comprises at least 32 addresses, where each address may be represented by at least five bits (i.e., log2(32)).
FIG. 6 shows Table IV, which illustrates the index values that may be generated by apparatus 400 for the bits of the exemplary binary vector u of FIG. 5. During the first clock cycle (i.e., clock cycle 0), the first vector memory address (i.e., address 0 from FIG. 5) is provided to computation blocks 418(0), . . . , 418(7), which calculate the first eight index values using Equation (1). For example, first computation block 418(0) calculates an index value of 0 (i.e., (bc×M)+m=(0×8)+0=0), which corresponds to the first bit of first sub-vector 0 shown in FIG. 5. Second computation block 418(1) calculates an index value equal to 1 (i.e., (0×8)+1=1), which corresponds to the second bit of first sub-vector 0. Computation blocks 418(2), . . . , 418(7) calculate index values 2, . . . , 7, respectively, in a similar manner for the third through eighth bits of first sub-vector 0. During the second clock cycle (i.e., clock cycle 1), the second address (i.e., address 1 from FIG. 5) is provided to computation blocks 418(0), . . . , 418(7). First computation block 418(0) calculates an index value equal to 8 (i.e., (1×8)+0=8), which corresponds to the first bit of second sub-vector 1 shown in FIG. 5. Second computation block 418(1) calculates an index value equal to 9 (i.e., (1×8)+1=9), which corresponds to the second bit of the second sub-vector 1. Computation blocks 418(2), . . . , 418(7) calculate index values 10, . . . , 15, respectively, in a similar manner for the third through eighth bits of second sub-vector 1. This process is repeated for subsequent clock cycles to generate further index values (up to 255). Although these index values are shown in FIG. 6 in decimal units, they may be provided to index memories 410(0), . . . , 410(7) as binary numbers, where each index value is represented by at least eight bits (i.e., log2(256)).
Referring back to FIG. 4, each computation block (i.e., 418(0), . . . , 418(M−1)) provides the index value that it generates during each clock cycle to the write port (i.e., 412(0), . . . , 412(M−1) of its corresponding index memory (i.e., 410(0), . . . , 410(M−1)). Similar to apparatus 100 of FIG. 1, only those index values corresponding to bits of binary vector u that have a value of 1 are written to index memory. This is accomplished by enabling the write port of each index memory in a manner similar to that used by apparatus 100. In particular, during each clock cycle, bits m=0, . . . , M−1 of sub-vector s are provided to (i) write enable ports 414(0), . . . , 414(M−1) of index memories 410(0), . . . , 410(M−1), respectively, and (ii) index memory counters 408(0), . . . , 408(M−1), respectively. If any of bits 0, . . . , M−1 have a value of 1, then the write ports (i.e., 412(0), . . . , 412(M−1)) of the index memories corresponding to those bits are enabled so that the index values provided by the corresponding computation blocks are written to an address of the index memories. The index memory addresses supplied by each index memory counter (i.e., 408(0), . . . , 408(M−1)), begin at 0, increment by 1 every time that the index memory counter receives a bit value of 1, and end at Wmax−1. To further understand how the index values are written to the index memories, consider FIG. 7, which expands on the example provided above.
FIG. 7 shows Table V, which illustrates how the index values of FIG. 6 may be stored in the index memories of apparatus 400 of FIG. 4. The index values shown correspond to the first ten sub-vectors s of FIG. 6. Suppose that the maximum possible hamming weight wmax that may processed by apparatus 400 is equal to 64. During the first clock cycle (i.e., clock cycle 0), bit values 0, 1, 0, 0, 1, 0, 0, and 1 (i.e., first sub-vector 0 from FIG. 5) are provided to index memories 410(0), . . . , 410(7), respectively, and index values 0, 1, 2, 3, 4, 5, 6, and 7 (i.e., the index values from FIG. 6 corresponding to sub-vector 0) are provided to index memories 410(0), . . . , 410(7), respectively. Index values 0, 2, 3, 5, and 6 correspond to bit values having a value of 0, so they are not written to the corresponding index memories (i.e., 410(0), 410(2), 410(3), 410(5), and 410(6), respectively). Index values 1, 4, and 7, on the other hand, correspond to bits having a value of 1, so they are written to the corresponding index memories (i.e., 410(1), 410(4), and 410(7)). Index values 1, 4, and 7, are written to the first addresses (i.e., 0) of index memories 410(1), 410(4), and 410(7), respectively, as shown in FIG. 7, since no index values were previously written to these addresses.
During the second clock cycle (i.e., clock cycle 1), bit values 0, 1, 0, 1, 0, 0, 1, and 0 (i.e., sub-vector 1 from Table FIG. 5) are provided to index memories 410(0), . . . , 410(7), respectively, and index values 8, 9, 10, 11, 12, 13, 14, and 15 (i.e., the index values from FIG. 6 corresponding to sub-vector 1) are provided to index memories 410(0), . . . , 410(7), respectively. Index values 8, 10, 12, 13, and 15 correspond to bits having a value of 0, so they are not written to the corresponding index memories (i.e., 410(0), 410(2), 410(4), 410(5), and 410(7), respectively). Index values 9, 11, and 14, on the other hand, correspond to bits having a value of 1, so they are written to index memories 410(1), 410(3), and 410(6), respectively. Index values 11, and 14 are written to the first index memory address (i.e., 0) of index memories 410(3), and 410(6), respectively, as shown in FIG. 7, since no index values were previously written to these addresses. Since index value 1 was written to the first address (i.e., 0) of index memory 410(1), the index memory address is incremented by 1 by index memory counter 408(1) so that index value 9 is written to the second address (i.e., 1) of index memory 410(1) as shown in FIG. 7.
During the third clock cycle (i.e., clock cycle 2), bit values 0, 1, 1, 0, 0, 1, 0, and 0 (i.e., sub-vector 2 from FIG. 5) are provided to index memories 410(0), . . . , 410(7), respectively, and index values 16, 17, 18, 19, 20, 21, 22, and 23 (i.e., the index values from FIG. 6 corresponding to sub-vector 2) are provided to index memories 410(0), . . . , 410(7), respectively. Index values 16, 19, 20, 22, and 23 correspond to bits having a value of 0, so they are not written to the corresponding index memories (i.e., 410(0), 410(3), 410(4), 410(6), and 410(7), respectively). Index values 17, 18, and 21, on the other hand, correspond to bits having a value of 1, so they are written to index memories 410(1), 410(2), and 410(5), respectively. Index values 18 and 21 are written to the first addresses (i.e., 0) of index memories 410(2) and 410(5), respectively, as shown in FIG. 7, since no index values were previously written to these addresses. Since index values 1 and 9 were written to the first two addresses (i.e., 0 and 1) of index memory 410(1), the index memory address is incremented by 1 by index memory counter 408(1) so that index value 17 is written to the third address (i.e., 2) of index memory 410(1) as shown in FIG. 7.
As shown in FIG. 7, apparatus 400 might not distribute the index values uniformly to the index memories. As a result, some index memories, such as index memory 410(1), might store more index values than others. Depending on the arrangement of bits within a binary vector u, it is possible that all index values corresponding to the bits of binary vector u that have a value of 1 could be distributed to one index memory. To accommodate this possibility, all of the index memories are designed to store the maximum possible number of index values wmax. Thus, the number of addresses in each index memory is equal the maximum possible hamming weight wmax of binary vector u, and the total size of each index memory is equal to at least wmax addresses×log2(n) bits per address. Since there are M total index memories, the combined size of the index memories is equal to at least M×wmax×log2 (n) bits. In the example above, the combined size of index memories 410(0), . . . , 410(7) is equal to 8×64×8=4,096 bits.
As described above, prior-art apparatus 100 of FIG. 1 may determine the index values for all bits of a binary vector u that have a value of 1 in wmax to n clock cycles (i.e., 64 to 256 clock cycles in the example for FIG. 1) depending on the arrangement of bits in the binary vector u. Apparatus 400, on the other hand, may determine the index values for all bits of a binary vector u that have a value of 1 in Wmax/M to n/M clock cycles (i.e., 8 to 32 clock cycles in the example for FIG. 4) depending on the arrangement of bits in the binary vector u. In considering the same binary vector u, apparatus 400 may determine the index values for all bits of the binary vector u that have a value of 1 M times faster than apparatus 100.