Contemporary computer applications, such as Microsoft SQL server, create an index of the content of documents in order to allow fast resolution of various types of queries about the indexed content. Due to ever-changing and increasing information, the content of a document may be updated several times during the document's life span, resulting in multiple indexes each referring to a different version of the same document.
Many current content indexing applications store indexing information in memory-mapped files. A memory-mapped file maps all or part of a file on disk to a specific range of addresses in the virtual memory of a computer system. FIG. 1 illustrates a memory-mapped indexing array 100. A memory-mapped indexing array, which is stored in continuous virtual memory, correlates each indexed document with a result index by associating each document identifier (“Doc_ID”) 102 with a corresponding index identifier (“Index_ID”) 104. For example, the memory-mapped indexing array 100 illustrated in FIG. 1 associates the document represented by Doc_ID F1 with the index represented by Index_ID 1, the document represented by Doc_ID F2 with the index represented by Index_ID 4, and the document represented by Doc_ID F3 with the index represented by Index_ID 6.
A 32-bit computer system may have up to four gigabytes of virtual memory space. Usually, the virtual memory space is highly fragmented; hence, it is hard to find a large block of continuous virtual memory space. Meanwhile, due to the explosion of information and the fast development of computer technology, a computer application, such as the next version of Microsoft SQL server, can easily index two hundred million documents, scalable to two billion documents. Using a memory-mapped indexing array 100 such as the one illustrated in FIG. 1 to store indexing data for such a large number of documents requires too large a memory space for the virtual memory of most contemporary 32-bit computer systems to accommodate. For example, assuming each Doc_ID 102 takes four bytes of virtual memory, and each Index_ID 104 takes another four bytes of virtual memory, then each pair of Doc_ID and Index_ID needs eight bytes of virtual memory. One million such pairs require eight megabytes of continuous virtual memory space to host the memory-mapped indexing array. Eight-megabytes of continuous virtual memory space is sometimes difficult for a normal 32-bit computing system to provide due to inherent memory address space fragmentation. Even more, an indexing array for two billion documents requires about sixteen gigabytes of continuous virtual memory space, which is usually beyond what current 32-bit computing systems can provide.
Another way to work with such a large memory-mapped array is to implement it as a file and operate it with a small number of memory-mapped sections. The oldest memory-mapped section would be unmapped when a new section is needed. This is exactly the way the virtual memory is extended in modem operating systems to a pagefile. But this technique would prove to be very inefficient if the pattern of accessing the memory-mapped array were in a totally random order, which means that constant mapping and remapping of different sections of the array is then necessary.
Therefore, there is a need for a method of content indexing that can store indexing information in patches of virtual memory space, instead of requiring a block of continuous virtual memory. Further, there is a need for a method of content indexing that efficiently determines whether an index references the freshest version of a document, when there are one or more indexes, each of which references a different version of the document. More broadly stated, there is a need for a method of indicating the freshness of changeable data, such as a document, associated with a container, such as an index. A container is associated with an item of changeable data by either containing or referencing this item of changeable data. There is also a need for a method of determining whether a container is associated with the latest or freshest version of changeable data. The present invention is directed to addressing these needs.