Internet portals and search engines, such as MSN®, serve as gateways to Internet users by accumulating and categorizing information, and providing a wide array of services. A portal may perform a search based on a query entered by a visitor to the portal. In an existing method for performing a search, affiliated data providers submit data to be searched directly to the portal. The data may be imported into the portal's database management system. In this existing method, the portals may regularly receive data from hundreds of providers. Each data provider may submit a provider data file including multiple schemas.
In addition to a submitting multiple schemas, each data provider may submit corresponding files of phrases. A phrase is a word or words identifying specific content corresponding to a particular node in a schema. For example, a particular data provider may have data about flights departing from Chicago. The provider may have a flight schema with a departure node indicating the city from which a flight departs. “Chicago” may be content corresponding to the departure node. The content “Chicago” may be identified by the phrases “Chicago”, “Windy City”, and “Cook County”.
Another data provider may have data about limousine services available in New York City. The provider may have a limousine schema with a location node indicating the city in which the limousine service is available. “Chicago” may be content corresponding to the location node. The provider may submit a file of phrases identifying the content “Chicago.”
In existing methods for performing a search, words in a query are matched to phrases submitted by each individual data provider identifying particular content in a particular schema. The efficiency of a search is greatly improved if, rather than matching words in a query to phrases submitted by each provider, words in a query are matched to a single phrase identifying particular content across multiple provider schemas. Such a single phrase identifying content across multiple provider schemas may be referred to as a “synset.” Phrases from individual providers must be matched to the synset.