Query processing can be important for search-enabled applications, including web search applications. Some solutions to processing queries for information have focused on methods of query segmentation, syntactic parsing, query classification, and query log mining.
Query segmentation generally relates to separating a query into a number of smaller units. Often there may be limitations on the types of segmented units possible, resulting in limited functionality in the method. Syntactic parsing generally focuses on identifying the linguistic structure of a query. Query classification generally falls into two groups: classification according to search intent, such as informational, navigational or transactional; and classification according to the semantics of a query, such as “shopping” or “living.” With either type of query classification, the whole query is generally classified and there is usually no further analysis on the internal structure of the query.
Query log mining has generally involved acquiring the named entities in a specific class from a query log. A named entity in this context may be a name within the query log that indicates a particular entity such as a real or imaginary person, group, organization, company, item, or the like. Query log mining has often been done by utilizing templates of the specific class. This approach is generally deterministic and usually only works in cases where a named entity belongs to only a single class.
Additionally, a method of Named Entity Recognition (NER) has been performed on text documents (generally natural language texts), using a defined set of rules based on sentence formation. The rules may include whether some identifiable features may be present in the documents. Such features may include whether or not the term “Mr.” occurs before a word, or whether or not the first letter of a word is capitalized. These types of features may indicate the presence of a named entity in the document. However, directly applying these traditional approaches to Named Entity Recognition in Query (NERQ) may not be effective, because search queries are usually short in word length (e.g., 2 to 3 words long) and not well formed (e.g., not formed in complete sentences and not properly capitalized—perhaps having all words in lower case). Thus, the identifiable features may not be present in most search queries.