This invention relates in general to locating information in a database, and more particularly to using an index that includes tags and metafiles to locate the desired information.
There is an ever-increasing amount of recorded and searchable information. To efficiently search for specific information, information retrieval systems have been developed. Information retrieval systems (xe2x80x9cIR systemsxe2x80x9d) are systems for finding, organizing, and delivering information. A computerized IR system typically responds to data inquiries or search requests by routing messages and files between a user interface and a search engine for a database in order to perform a search of the database for desired information.
A goal of an IR system is to locate the requested information as quickly as possible. However, one problem with IR systems is that the search results returned do not always include the information requested. If the search results do not include the information requested, then the user must repeat the search using a different search request. One reason that the search results returned may not include the information requested is that the IR system incorrectly interpreted the search request. This may happen if the search request uses an ambiguous term. The search request may be ambiguous because a term used in the search request has multiple meanings. For example, if the search request includes the term xe2x80x9cFordxe2x80x9d, it may be unclear whether the request is directed to the Ford Company, the Ford Theater, or the FORD brand of vehicles. Thus, there is a need in the art for a method that eliminates any ambiguity in the search request.
Another problem is that too much information can be returned to the user. If the user enters a broad search request, then the user may be overwhelmed by the amount of information returned and may not be able to locate the desired information in the search results. For example, if the search request specifies the FORD brand of vehicles, the search results returned may include information on every Ford vehicle, including automobiles, trucks, vans, and vehicles that are no longer in production, as well as information on the repair and sale of FORD brand vehicles. If the user only wanted information about a particular model of automobile, the user must sort through the search results to locate the desired information. Thus, there is a need for a method that focuses a search so that only the most relevant information is returned or that queries a user for additional search criteria so that the information desired by the user is provided.
Due to the number of databases, it is possible that information stored in one database is repeated in another database. The same information may be stored in multiple databases to accommodate the requirements of different types of IR systems. To eliminate the need to maintain multiple databases that contain the same information, a universal search vocabulary is needed. If a universal search vocabulary is used to create a database, then any IR system that uses the universal vocabulary can locate information in the database.
Even though there are a multitude of databases, the requested information may not be located in a single database. If a user requests information that is stored in separate, unrelated databases, then the user may need to conduct multiple searches using different IR systems to locate all of the desired information. To eliminate the need to conduct multiple searches, a universal search vocabulary is needed to search any number of separate, unrelated databases to locate the desired information.
Accordingly, there is a need in the art for an improved method of searching that uses a universal search vocabulary. The method should eliminate ambiguity in the search request, focus the search on the most relevant information, perform the search in the most efficient manner and support searching multiple databases. The method should also support a hierarchy that can be used to query a user for additional search criteria in an efficient and intelligent manner.
The present invention meets the needs described above by providing a method for locating information stored in a database using an index that includes tags and metafiles to locate the desired information. In general, an index is essentially a guide that is used to locate information stored in a database. Preferably, the index includes tags that correspond to categories and domains. A category includes a group of terms. A term may appear in more than one category, but a term may only appear once in any given category. For example, the term xe2x80x9cAmericanxe2x80x9d may appear in the Cuisine category and in the Brand category, but may only appear once in the Cuisine category.
A domain is generally described as a grouping of categories. For example, the Restaurant domain may include the Cuisine category and, therefore, the terms xe2x80x9cMexicanxe2x80x9d and xe2x80x9cAmerican.xe2x80x9d The domains, categories, and terms are used to locate information within the database.
The index is created so that a tag is associated with each domain (a domain tag) and with each term associated with a category (a category tag). A tag is associated with data or text and conveys information about the data or text. In one aspect of the invention, the tags are XML (eXtensible Markup Language) tags. For example, an XML tag is created for the Restaurant domain and another XML tag is created for the American Cuisine category. In addition, many of the tags have an associated metafile. A metafile provides additional information about the tag. A metafile typically includes a list of related tags, such as domain tags and category tags. A metafile also implements a hierarchy between the tags in the metafile.
Each record of an exemplary database includes an Alpha Component and an XML Index Component. The Alpha Component contains identifying information for the record and the XML Index Component includes XML tags that are associated with the record. When a search request is received, a set of tags that correspond to the request are identified. The set of tags is compiled as a key and is used to search the database to locate records that include the set of tags.
A search is generally initiated by an information request. The information request can be received from a user or can be generated from an agent search. The information request is parsed to identify terms in the request. The terms are predetermined and correspond to the domains and categories of the index. The terms are mapped to XML tags. Once the terms are mapped to the XML tags, a determination is made as to whether the XML tags indicate that the request is ambiguous. The XML tags can indicate that the request is ambiguous if a single term in the request is related to more than one XML tag. If the XML tags indicate that the request is ambiguous, then the XML tags are used to conduct a query to determine the appropriate XML tags. The query may include querying the user for additional information.
Once the appropriate XML tags are identified, then the metafiles that correspond to those XML tags are identified. Each metafile is examined to determine whether the XML tags in the metafile indicate that there are any related domains or categories. If there are a number of related XML tags in a metafile and the request does not clearly identify one of the related XML tags, then the metafile is used to supply information to a disambiguation process that identifies the tags that should be used to conduct the search. Once the query has been conducted to identify one of the XML tags, then that XML tag is combined with the other XML tags identified by the metafile and any other queries to create a unique key. The key is used to search the database to locate records that include the XML tags in their XML Index Component. Once the records are located, the records are delivered to the requesting user or search agent.
These and other aspects, features and advantages of the present invention may be more clearly understood and appreciated from a review of the following detailed description of the disclosed embodiments and by reference to the appended drawings and claims.