In the context of computer science, an inverted index, also referred to as postings file or inverted file, is an index data structure for storing a mapping from content (e.g., words or numbers) to its locations or positions in a database file, a document, or a set of documents. The purpose of an inverted index is to allow fast full text searches, albeit at a cost of increased processing when a document is added to the database. It is the most popular data structure used in document retrieval systems.
One of the applications for an inverted index is in the field of network search engines. The Internet provides a vast amount of information. The individual pieces of information are often referred to as “network resources” or “network content” and may have various formats, such as, for example and without limitation, texts, audios, videos, images, web pages, documents, executables, etc. The network resources are stored at many different sites, such as on computers and servers, in databases, etc., around the world. These different sites are communicatively linked to the Internet through various network infrastructures. Any person may access the publicly available network resources via a suitable network device (e.g., a computer, a smart mobile telephone, etc.) connected to the Internet.
However, due to the sheer amount of information available on the Internet, it is impractical as well as impossible for a person to manually search throughout the Internet for specific pieces of information. Instead, most people rely on different types of computer-implemented tools to help them locate the desired network resources. One of the most commonly and widely used computer-implemented tools is a search engine, such as the search engines provided by Microsoft® Inc. (http://www.bing.com), Yahoo!® Inc. (http://search.yahoo.com), and Google™ Inc. (http://www.google.com). To search for information relating to a specific subject matter or topic on the Internet, a person typically issues a short phrase or a few keywords, which may be words or numbers, describing the subject matter, often referred to as a “search query” or simply “query”, to a search engine. The search engine conducts a search based on the search query using various search algorithms and generates a search result that identifies network resources that are most likely to be related to the search query. The network resources are presented to the person, often in the form of a list of links, each link being associated with a different network document (e.g., a web page) that contains some of the identified network resources.
The network documents may each include any number of words or numbers or symbols, and each word or number or symbol may be referred to as a token. An inverted index may be used to store a mapping from the tokens in the network documents to their positions in the network documents. Upon receiving a search query from a person, a search engine uses an inverted index to determine which network documents contain all or most or some of the query keywords. This improves the speed of the search process. Typically, about two-thirds of an inverted index are the position data of the tokens in the documents (i.e., the individual positions of the tokens in the documents).
Sophisticated search engines implement many other functionalities in addition to merely identifying the network resources as a part of the search process. For example, a search engine usually ranks the identified network resources according to their relative degrees of relevance with respect to the search query, such that the network resources that are relatively more relevant to the search query are ranked higher and consequently are presented to a person requesting a search before the network resources that are relatively less relevant to the search query. The search engine may also provide a short summary of each of the identified network resources. There are continuous efforts to improve the qualities of the search results generated by the search engines. Accuracy, completeness, presentation order, and speed are but a few of the performance aspects of the search engines for improvement.