In U.S. patent application Ser. No. 08/695,059 filed by Burrows on Aug. 9, 1996 a method for encoding delta values is described. Delta encoding is a way to compress an ordered sequence of integers. Each delta value is the difference between a current integer and a previous integer in the sequence. Subsequently, any integer in the sequence can be recovered by sequentially adding the delta values to each other. As disclosed by Burrows, the integers are encoded as delta values as follows.
First, each delta value is written in a binary form. The binary form of each delta value is then partitioned into groups of seven bits, padding up to a multiple of seven bits when necessary. The number of groups required depends on the size of the delta value. If the integers in the sequence are relatively close together, then most delta values can be expressed in two groups. Only occasionally are one group or more than two groups of seven bits required.
After grouping, an additional "continuation" bit is added to each group of seven bits to form eight bit bytes. The continuation bit is set to one when the delta value continues into the next byte, and zero when the byte stores the last group of bits of the delta value. The continuation bit is set as either the high-order bit or the low-order bit for reasons described below. The list of thus encoded bytes can be stored as a file in a memory of a computer system.
Burrows describes one application where his method for delta encoding can be used. On the Internet, the World Wide Web is used to exchange information as Web pages. A Web page is composed as a sequence of tokens or "words." Words are set of characters separated by some type of separator character, for example, a space character. In Web pages, the words can be any arbitrary set of characters.
An index to the words of the Web pages can be built in a search engine, such as Digital Equipment Corporation's AltaVista search engine. The collection of pages to be indexed can be acquired by a Web "spider." As the Web pages are collected, integers are sequentially assigned to each word of all of the collected Web pages to form pairs, a word and an integer. The integers effectively are the locations or addresses of the words.
The pairs can be sorted according to the collating order of the words. The sort can be some lexicographic sort on the characters of the words. The sort results in a file having a plurality of entries, one entry for each unique word, followed by all addresses where the word appears in the Web pages. The integers representing the addresses of the word are delta encoded as described above. Each entry in the file has two components, a word field, and the list of associated delta values. No word entry appears more than once in the file.
The words are represented by a common prefix count followed by a sequence of non-zero characters that represents the suffix portion of the word. The prefix count is the number of characters of the word that are identical to a word in a previous entry. The prefix count is also an integer number that can be delta encoded as described above. The prefix count is stored as a value one greater than the actual prefix count. As a consequence, the representation of a prefix count will never be zero, and zero can be used as a terminator.
The delta values are byte packed into the file, and the end of the list of delta values is indicated by a zero byte. The next word entry, if any, begins in the following byte.
Because there are many Web pages, frequently used words, for example, "the," might occur at hundreds of millions of positions in the Web pages. By encoding the addresses as delta values, memory storage is greatly reduced.
After the index has been created, users of client computers connected to the Internet can pose queries to the index of the search engine to locate Web pages that match on the queries. An integer address of a word on any Web page is recovered as follows. The bytes are scanned sequentially according to the continuation bits of the bytes. This locates the bytes that represents a particular delta value. The continuation bits are discarded, and the groups of seven bits are reconstructed. Finally, the bits are concatenated to form the binary value that represents the next integer address of the word.
Decoding is accelerated for the following reason. As stated above, most delta values are represented by two bytes. Thus, the most common encoding consists of a byte whose continuation bit is one, followed by a byte whose continuation bit is zero. By making the continuation bit in the first byte of any delta value be the low-order bit, and the continuation bits in subsequent bytes the high-order bit, the two-byte quantity representing the integer becomes a 16-bit word whose low order bit is set to one and whose high order bit is set to zero. This is for computers that use the "little-endian" convention, which is true for most modem computers. If the decoding method reads two bytes, i.e., 16-bit words, then the recovery of the original integers can be performed more quickly. Integers whose delta values are represented by 1, 3, or more bytes take more instructions, but these should be rare.
However, there is a problem with the above described delta value decoding technique. Modem processors (CPUs) are able to issue multiple instruction during each CPU clock cycle. In addition, modem processors have an instruction pipelines that includes multiple pipeline stage, for example, a fetch stage, a map stage, an issue stage, an execution stage, and a retire stage.
As a result, many additional instructions can be fetched into the pipeline before an earlier fetched instruction completes. This tends to make modem computer systems run faster, provided that they have a linear sequence of instructions to execute. When the processor encounters a conditional branch instruction, the next instruction to be executed may be the next sequential instruction appearing after the conditional branch when the branch is not taken, or the next instruction may be some other instruction at some displaced address when the branch is taken. In the later case, the instructions that were fetched into the pipeline will be wrong. The pipeline will need to be "flushed," after which fetching can resume at the branch taken address. Resetting the pipeline consumes processor cycles and degrades performance.
In order to improve throughput, most modem processors try to predict whether a branch is going to be taken, or not. Typically, this is done using branch prediction logic. This way instructions can continued to be fetched along the predicted execution path while computing the outcome of the branch. This prediction is based on the previous behavior of the execution flow. If previous execution flows indicate that the branch is usually taken, then the fetching of the next instructions will be predicted accordingly, or vice versa.
If the logic predicts a branch correctly, then instructions continue to be executed at the normal rate. However, if a branch is mispredicted, then the processor must discard the entire state in its execution pipeline because it has started to execute instructions that must not be allowed to complete. Because several instructions may be in each pipeline stage, and the pipeline has several stages, this means that many instruction execution opportunities are wasted.
In the decoding technique described above, a conditional branch instructions must test each 16-bit word that is fetched to ensure that the low order continuation bit is a one, and its high-order continuation bit is a zero, as would be the case for most two-byte delta encoded values. If the branch is not taken, then the instructions handle the frequently occurring groups of two byte quite well. Only in the case of where the delta value is encoded in one or more than two bytes does the technique suffer from degraded performance.
Normally, the branch will not be taken because the 2-byte delta values are most common. Branch mispredictions, on the average, will occur about one-third of the time. In the preferred implementation of the above decoding, it takes exactly six instructions to recover 2-byte delta values, so one might expect to execute about 18 instructions between branch mispredictions, on the average. On very recent processors, the number of instruction execution opportunities lost on a branch mispredict can exceed 18. On such a processor, most of the time taken to decode a sequence of integers is lost in branch mispredict processing.
In general, branch mispredicts can consume half of the instruction execution opportunities in the inner loop of the procedure for scanning the list of delta values. It is the intent of the present invention to increase pipeline utilization by minimizing branch mispredicts.