1. Field of the Invention
The invention relates to an indexing device and a method for providing an index for a stream of data. More particularly, it relates to a method for providing a compressed index for a stream of binary records, and a compressed index for a stream of binary records.
2. Related Art
In and between computer networks, communication may have to be recorded for security, management and maintenance reasons. In order to post process the saved communication, data indices can be used which sum up the information of certain fields of the records that form the communication.
A data index can be implemented as a bitmap index, which is a matrix with columns for each distinct value a field can represent and rows for the actual values. For a particular record, the column that matches the value in a predetermined field of the record is filled with a binary 1 while the other columns are filled with binary 0s. The columns of the bitmap index are then encoded with a run length encoding. The encoding is chosen such that pattern-matching with search patterns containing Boolean operators such as “AND”, “OR” can be carried out on the compressed columns, e.g. “records in which the sender's address is in range X AND the recipient's address is in range Y”. One such coding is known as Word Aligned Hybrid Code (“WAH”) and published in U.S. Pat. No. 6,831,575.
In order to minimize bitmap index sizes, a method called FastBit, which implements the WAH coding, facilitates an optional offline sorting of records before they are encoded. In an online system, where a potentially endless stream of records must be indexed and processed quickly, offline sorting is heavy on computing resources and works only on more or less arbitrary chunks of records, which can degrade sorting quality.