Many types of search engine indexing algorithms utilize inverted indexes. An inverted index is a data structure that is utilized to store a mapping between terms and the location of the terms within a database, document, or set of documents. For instance, an inverted index may be utilized to store a mapping between words and World Wide Web (“Web”) pages in which the words are utilized. Data identifying the particular location at which each term appears within a document might also be stored in an inverted index. The list of documents in which a particular term appears is commonly referred to as a posting list.
Some types of indexing algorithms generate a separate entry in the inverted index for each semantic role that a term occurs in. This results in a separate posting list and a separate entry in the index to the posting lists, called the lexicon, for each term-role pair. For instance, one posting list may be created in the index for the word “dog” and the role “subject.” Another posting list may be created for the word “cake” and the role “object.” In order to identify documents where a dog is the subject and a cake is the object, such as for example where a dog is described as eating a cake, an intersection operation is performed between the two posting lists. Semantically based search engines may utilize this type of indexing and document retrieval.
Because inverted indices can grow very large in size, they are often stored on disk. Portions of the inverted index may be read from disk into main memory for quicker access. Regardless of the type of physical storage medium an inverted index is stored upon, it is often the case that no particular assumption is made about the layout of posting lists on the physical storage medium relative to one another. However, an arbitrary layout of posting lists on a physical storage medium can lead to poor performance, especially in systems using an inverted index where runtime operations are performed to the intersection of posting lists for multiple terms that are related to each other in a strict dominance relation, such as semantically based search engines.
It is with respect to these considerations and others that the disclosure made herein is presented.