Information quantities on the Internet have increased vastly as the Internet has become more prevalent. The emergence of search engine technology has enabled people to conduct fast and convenient searches among these vast quantities of information to find the various kinds of information they need.
Using character index systems to search for information has already gained broad application. Character index systems include large numbers of preset index tables. FIG. 1 shows an example of such a preset index table. In the example, index table 100 includes three main columns of data: the left column includes indexed character(s) (e.g., a character may be indexed by itself or with one or more other characters; for example, each indexed single Chinese character or character combination may comprise a phrase or a saying), the middle column includes the number of documents that include the corresponding indexed character/combination of characters; the right column includes the address associated with each of the corresponding documents that include the corresponding indexed character/combination of characters (e.g., each of “A1, A2 . . . ” represents an address in a database, for example, where an indexed document may be found). While Chinese characters are indexed in the example of index table 100, English words and morphemes of any other language may comprise the subject of indexing. For example, an indexed document is a document that has been indexed such that a reference to and/or a portion of the document, such as an address of the location at which the document is stored, may be stored to quickly retrieve/identify the document. For example, a webpage that has been processed by a web crawler may be an indexed document. In response to a search query, an index table such as index table 100 may be queried. For example, first, one or more indexed single characters and/or combinations of characters may be extracted from the search query. Then index tables such as index table 100 may be queried for indexed documents that include the indexed characters extracted from the search query. The indexed documents may be returned to the querying user.
FIG. 2 is a diagram showing an example of conducting a search using an index table. Index table 100 may be used in this example. The example process includes the following steps: 1) Receiving a search query from a user and segmenting the search query into character combinations (e.g., phrases that include one or more characters), 2) Separate the character combinations into single index characters and query index tables (e.g., index table 100) for indexed documents that include the single index characters (e.g., for example, referring to FIG. 1, the indexed document set for the single character “” (“Zhe”) includes 10 documents that each include the character “,” 3) Perform set intersection operations on the indexed document sets returned for the single index characters belonging to the same character combination such that the resulting indexed document set for that character combination includes documents that each includes all the single characters belonging to that character combination, and 4) Perform set intersection operations on the indexed document sets determined for each character combination such that the final search results indexed document set includes documents that each include all the character combinations of the original search query. Typically, there is a large volume of indexed documents for each single index character so repeatedly performing set intersection operations on all of the retrieved indexed document sets may be very inefficient because there are usually several single index characters in each character combination, and also several character combinations in each search query.