Searches are typically conducted using a linear or with a flat technique. In other words, an end-user constructs a search request by providing a number of search terms or phrases and passing the terms or phrases to a search engine. Each search term or phrase represents an equally important term or phrase that is compared by the search engine against a target data corpus, which can include a variety of matching terms and phrases. Generally within the data corpus, matching terms and phrases are associated with data files often referred to as documents. The search terms and phrases are further constrained by a search logic with defines how all the terms and phrases are to be matched in the search request against documents appearing in the target data corpus. For example, a search engine can employ Boolean logic consisting of logical operators, such as “AND,” “OR”, “NOT,” and the like.
Accordingly, with some search engines the end-user also provides the search logic for the search request by using Boolean constructions to associate the terms and phrases included within the search request. In other search engines, if the end-user includes no search logic then the search engine defaults to an “OR” logic and implements a searching algorithm to further constrain the results of the search request when performing a search against the target data corpus. For example, some search engines only include matching documents within the target data corpus having a pre-defined frequency of recurring terms or phrases appearing within any of the matching documents found.
Other search engines permit a more structural or hierarchical search requests to be provided by an end-user. Typically, these search engines allow the end-user to identify structural syntax in the search request that identifies where in a target document a specific term or phrase is to occur. For example, a target document can structurally be represented as a document with a title and an author, where the end-user desires documents having only terms or phrases occurring within the title of the target document and documents with a specific author. In this example, a search engine providing a hierarchical or structural search request would permit the end-user to define the title and author limitations in the initial search request to constrain the search.
Additionally, search engines can provide a variety of other useful features to the end-user, such as search term morphology where the singular and plural forms of search terms are recognized by the search engine when conducting the search on behalf of the end-user. Generally, morphology is achieved by the search engine using stemming techniques, where any end-user received search terms are decomposed into their root word forms. And, the target data corpus is indexed on only the root word forms of the search terms. Furthermore, searches can be augmented to include synonyms or thesaurus terms associated with the search terms.
In general when a search is performed against a data corpus, an answer set is returned, the answer set includes matching documents found within the target data corpus as constrained by the search terms and search engine processing logic. Search precision is defined as a total number of truly relevant documents included within the answer set divided by a total number of all documents included within the answer set. Search recall is defined as the total number of all documents included within the answer set divided by all relevant documents that exist in the target data corpus that truly satisfies the initial search request. Search accuracy is defined as the average of the search precision and the search recall. Generally, as search engines increase search precision the search recall proportionally decreases, and vice-versa.
Once an answer set is returned from a search occurring against a target data corpus, the documents within the answer set are ranked according to a variety of ranking algorithms. When the answer set is ranked, the documents within the answer set appear in a generated order, where the most relevant document appears first and the least relevant document appears last within the answer set. Some ranking algorithms, rank documents based on dates or other metadata features associated with the documents included within the answer set.
Further, some ranking algorithms rank documents based on the frequency and location of matched terms and phrases occurring within a single document of the answer set as compared to an average of term or phrase frequency of matched terms occurring for an average document appearing within the answer set. Some Internet ranking algorithms, will rank documents within the answer set based on extrinsic factors, such as a total number of external websites which link to a document included within the answer set. In these algorithms, the documents popularity as opposed to its true relevancy controls how the document is ranked within the answer set.
Worldwide web (WWW) search engines are provided to end-users via a browser interface connected to the Internet, these search engines typically have good recall but have poor search precision and poor search ranking techniques. As a result, end-users are often extremely disappointed with WWW search engines and are often inundated with irrelevant and undesirable documents within the answer set, which the end-user is forced to inspect in search of relevant and important documents to the end-user. Moreover, when an end-user views a selected document within an answer set provided by an Internet search engine, the end-user is not presented with the document in a format that assists the end-user in rapidly ascertaining whether the selected document is useful to the end-user.
Furthermore, the browser interfaces and search request construction techniques employed by existing WWW search engines are cumbersome, cryptic, and largely linear. Accordingly, the end-user is circumscribed in the manner and way that the initial search request is provided to any WWW search engine. Also, the end-user is often not permitted to select and define which target data corpora to submit a search request for processing, since the target data corpora are generally predefined within the search engine. Correspondingly, the end-user is not capable of limiting the target data corpora in an attempt to reduce the total number of irrelevant documents received from a search.
Accordingly, as is apparent to one of ordinary skill in the art a variety of limitations with existing WWW search engines frustrate the end-user. These limitations include the difficulty associated with constructing search requests, visualizing search answer sets, digesting search answer sets, and analyzing search answer sets.
As is apparent, there exists a need for providing improved search construction, submission, and processing techniques. More specifically, there exist needs for providing improved non-linear search construction, improved target data corpus selection and submission, and improved search processing and ranking.