The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
A compiler is a computer program that translates a series of statements (source code) written in one language (called the source language) into output in another language (often called the object or target language). The output produced by a compiler typically takes the form of code that may be executed by a computer or a virtual machine.
To be processed correctly by the compiler, source code must conform to the source language. Source languages typically define a set of operators. Typically, the set of operators that may be used in the language, and the way a compiler translates each operator, is hard-coded into the compiler. Consequently, new operators may only be added to the language by re-writing the compiler to include support for the new operator. Re-writing a compiler to extend a language is a difficult task. Typically, such a re-write can only be performed by the developer of the compiler, since the compiler developer is usually the only one that has access to the source code of the compiler. Thus, parties that use the compiler are severely limited with respect to extending the source language supported by the compiler.
Compilers are used in a variety of contexts. For example, compilers are used within search engines to compile search queries received from search applications. A search application (also referred to herein simply as an “application”) and a search engine are often developed and managed by different parties. To interact with third-party applications, search engines typically have a well-defined API for receiving and responding to search queries. Through the API, applications submit search queries to the search engine. To be properly processed by the search engine, the queries must conform to the query language supported by the search engine.
The process by which a query is received and executed typically begins with an end user inputting a query (i.e., “end user query”) into a search field of an interface generated by an application. The end user queries themselves typically do not conform to the query language supported by the compiler used by the search engine. Consequently, the application converts the end user query into an “application query” that conforms to the query language supported by the compiler used by the search engine. The application then sends the application query to the search engine, where the application query is compiled and executed.
Since the search application is where the queries that are sent to the search engine are generated, the developer of the search application can be considered the “user” of the compiler. Thus, the application developer must design the search application in a manner that takes into account the limitations of the query language supported by the search engine's compiler. Because the application developer does not have the ability to re-write the compiler, the application developer usually has no ability to extend the language in which the application queries are formulated.
The compiler of a search engine includes routines for processing the operators included in the query language supported by the compiler. The operators that are initially supported by the compiler are referred to herein as “pre-supported operators”. Unfortunately, the pre-supported operators may not provide all of the functionality desired by a search application developer. Theoretically, an application developer that wishes to have additional functionality may extend the query language supported by any search engine by modifying the source code of the compiler used by the search engine. However, as mentioned above, search applications and search engines are typically developed by different parties. Thus, a search application developer is not likely to have access to the source code of the search engine with which the developer's application interacts.
In addition to supporting only rigid query languages that are not easily extended, current search engines also do not provide general access to low level primitives, such as document selection and scoring operators, and are otherwise limited in that current search engines have a small set of fixed ways of handling queries. Values for such primitives are generated and used by the routines implemented in the search engine. However, the interface exposed to applications by the search engine does not provide any mechanism by which those applications can see or use those values.
Large scale search engines have the potential of supporting many different applications, such as user-adaptive query processing, data mining, complex algorithmic query execution for better relevance, etc. However, large scale search engines are currently programmed to handle only one or a fixed number of searching applications.
Because each application may serve widely diverse needs, different applications may indeed require different ranking functions, for instance. Unfortunately, document selection and ranking functions in current search engines are tightly coupled and thus are not easily customizable. For example, there is no current way for application developers to have application queries refer to document selection and ranking functions. Such functions are only accessible to routines internal to the search engine. Thus, application developers are unable to define new ways to select and rank documents.
Some search engines are publicly available. One such search engine is Lucene, which provides a relatively uncomplicated query language with Boolean operators and simple filters. However, similar to commercial search engines, the query language of Lucene is fixed. A publicly available research search engine which provides a more complicated query language is Indri. Indri, however, is also fixed and the retrieval model (i.e., document selection) is closely tied to the relevance model (i.e., document ranking). Thus, current search engines are not easily extensible and customizable with respect to selection and ranking operators.