Canadian Patent No. 1,338,601, which is fully incorporated herein by reference, is directed to a system and method for representing relational databases using binary representations. Data in a relational database may be described as structured data since the data may be organized into structured columns, rows, and the like.
Unstructured data, on the other hand, is data that is stored as a document, and not contained in the tables of a database. The document may be a memo, book, e-mail message, design specification, or the like.
Current mechanisms for representing and searching for unstructured a data are inefficient and costly. One mechanism uses suffix tries. A suffix trie is a trie that represents a given string by including it and all its suffixes. For example, the string “This is a cat” would be represented in the trie by the strings “This is a cat,” “is a cat,” “a cat,” and “cat.” However, a document must be indexed as a single string, or duplicate strings might occur. Each string in the trie must necessarily be distinct. However, this may take up a lot of space. Although mechanisms exist for compressing the information, the strings must nonetheless be represented, and compressing the trie has a cost of increased search time. Accordingly, there is a need for representing and searching for unstructured data in an efficient and cost-effective manner.