1. Field of the Invention
The present invention relates generally to search engines. More specifically, the present invention relates to a method and an apparatus for identifying “standalone locations,” what can be unambiguously identified by their names alone without additional location specifiers.
2. Related Art
Standalone locations are the locations that can be unambiguously identified by their names alone, either within a specific geographic region or globally. For example, the name “San Francisco” usually refers to “San Francisco, California, Unites States” even without additional location specifiers like “California”, and “United States” (so it is standalone location). However, the name “Washington” as a location could refer to the “City of Washington” in the state of Missouri, “Washington, D.C.” or “Washington State”, so it is not strictly a standalone location in the United States. Moreover, a large number of locations are not standalone because they do not have names that uniquely identify them; an extreme case is the city of “Orange” in the state of Texas: just given its name, most people do not think it is a location.
Formally, a standalone location can be defined as follows: given a geographic-range R, a location L is standalone if and only if any location query on L can be unambiguously formulated by the query template {Query} {L} or {L} {Query} in R. For example, no matter where users are located, a search by a user for the hotels in “San Francisco” can be safely represented as “San Francisco Hotels” or “Hotels San Francisco”. In contrast, “Orange Hotels”/“Hotels Orange” is pretty confusing; very few people could understand and actually use such queries.
The ability to identify names of standalone locations within a search query has a huge impact on quality of a search results generated by the query. Without such knowledge, the query processor cannot tell the difference between an obvious location query such as “new york pizza” (new york is a location) and an obvious non-location query such as “orange juice” (orange could be a location, but not here).
Furthermore, empirical measurements indicate that when users include location information in queries, more than 90% of the time this location information is specified using standalone location names. Hence, the ability to identify standalone location names in queries is of primary importance if location information is to be used while processing queries.
However, it is a hard problem to automatically determine whether or not a location is a standalone location. In general, the difficulty arises from the following two aspects: (1) there exists no appropriate knowledge base upon which to perform inferences; and (2) the concept itself has some ambiguity and it is consequently hard to formulate any uniform rules for determining whether a location is a standalone location. Note that this problem is even difficult for human beings because different people can have different criteria for determining whether a location is a standalone location.
To facilitate searching involving standalone locations, search engines presently use standalone city lists. However, the tasks of internationalizing and maintaining these standalone city lists are presently performed through labor-intensive and error-prone manual processes.
Hence, what is needed is a method and an apparatus for generating and maintaining a list of standalone locations without the above-described problems.