A search domain is a self-contained set of information pages, usually specific to a subject or function. Frequently, web sites that provide searching functionality are directed to a specific search domain. For examples, a web site for shopping may allow searching in the “product” domain, a web site for downloading music may allow searching in the “music” domain, a web site focused on medical information may allow users to look up medical information, and a financial web site may allow users to search for products or services relating to managing finances. Typically, at each of these sites, the information pages, together with structure and indexing information, are stored in a data repository.
Search engines may be used to index a large amount of information. Web sites that include search engines typically provide an interface that can be used to search the indexed information by entering certain words or phrases (keywords) to be queried. The information indexed by a search engine may be referred to as information pages, content, or documents. These terms are often used interchangeably.
A searchable item is a logical representation of an information page or piece of content that is maintained within a search engine platform. Search engines help users to locate searchable items. Sometimes a searchable item represents an electronic document, such as a white paper, or content, such as a video that can be viewed by streaming it over a network connection or downloaded to a computer system for local viewing. Other times, the searchable item is a description and representation of something in the real, physical world, such as a person, or a product for sale. Searchable items can be descriptions of electronic or physical items.
Search engines may analyze the searchable items within a repository, extracting categorization information and constructing indexes that are used to find relevant data when a search is requested. Using a search engine, a user can enter one or more search query terms and obtain a list of search results that contain or are associated with subject matter that matches those search query terms. When a user performs a search, the set of pages found during the search and presented to the user along with other search and navigation hints are called the “search results.” Each page listed in the search results is called a “hit.” When a user selects a content page for viewing, that event is called a “click” because usually, though not always, the selection is specified by clicking a mouse button.
In addition to storing representations of content and responding to user requests to find content, a search engine platform must be able to respond appropriately when the organization of the content repository changes.
One example of a search engine is a vertical domain search engine. A vertical domain search engine provides searching over a specific search domain. Examples of vertical domain databases include search engines that provide searching a legal or a medical database of information. Within each of these examples, the content searched for has a common subject (law or medicine, respectively) and is assigned categories and attributes relevant to the subject matter by domain experts who manage the content. For example, categories supported by a law search engine might include State or Federal Case Law, State or Federal Statutes, Treatises, Legal Dictionaries, Form books, etc. with attributes such as publication date, legal topic, history, etc. A medical search engine might have categories of Symptoms, Diagnostic procedures, Treatments, and Drugs. Attributes might include parts of the body affected and have potential values such as respiratory, circulatory, nervous system, etc. The repository for both vertical domains is highly structured within each system, but the structure for each domain is different from the structure of domains pertaining to different subject matter.
When a search domain is managed in isolation from other domains, it is convenient to take advantage of a commercial database management system for storing, searching, and maintaining the content, for several reasons. First, the managed data is highly structured, and the structure is uniform across the domain. The structure of the data maps easily to a fixed database schema. Second, a vertical search domain typically does not require handling the same high volume of query traffic as a general, domain-independent search engine, and the number of different search queries can be constrained to the limited taxonomy of the domain. A problem faced by companies that own and operate vertical domain search engines is that in addition to having to manage the structure of the repository, the companies must also manage the search engine platform including database management. Domain experts are not necessarily experts in IT management which can be very complex.
Another example of a search engine is a general, domain-independent search engine. The World Wide Web (Web) provides access to millions of pages of information that are often poorly organized, and it can be difficult for users to locate particular Web pages that contain the information that is of interest to them. This kind of search engine must be extremely scalable, with the ability to handle millions of concurrent queries and hundreds of thousands of different queries. The Web pages indexed for use with this kind of search engine are not very structured, so there is no expectation of a common taxonomy for an arbitrary collection of web pages.
Whether or not a commercial database is used for storing a data repository, it is common for modern search systems to operate using at least two parallel search repositories for searching. Whenever a change is made in the structure of the repository, system downtime is required. One of the parallel systems is taken offline and totally re-indexed. Once the changed repository comes back online, another system is taken offline and similarly modified.
To avoid the need for each company to maintain its own vertical search engine, multiple companies may try to combine their search engines. For example, combining a legal search engine with a medical search engine may be attempted, so that a user searching for information on medical malpractice would find content from both with one search request. One way this could be done, would be to define a common database schema to be used across all vertical domains and incorporate all domain repositories into the same database. The new schema would have a place for each unique attribute in the union of attributes across all hosted domains. Given a very large domain and/or many hosted domains, there could be millions of unique attributes required in the unified data schema. Such an approach would not be scalable.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.