1. Field of the Invention
This invention pertains in general to extracting information from a network and in particular to building a set of structured information from electronic documents on the network.
2. Description of the Related Art
Information on the Internet or another network can be difficult to find. Search engines allow users to locate content having specified characteristics. In some cases, however, the effectiveness of search engines is undermined by the sheer volume of information available on the Internet. For example, a person searching for a restaurant with a common name, such as “Tom's Restaurant” will receive a large number of matching results through which the person must wade to find the correct restaurant.
One way to remedy the “too much information” problem is to enable searching on a smaller set of information. A search engine can allow a person to search a directory specific to a particular city or other geographic area. That way, a person looking for “Tom's Restaurant” in New York, N.Y., can specify that the search should be limited to only restaurants in New York City. As a result, there are likely to be fewer search results, and it will be easier for the searcher to find the correct result. Moreover, the local directory can provide additional features, such as providing a map showing the location of the restaurant.
Building a directory with robust functionality is a complex process. Certain types of information, such as names, addresses, and telephone numbers for restaurants and other enterprises within a city are relatively easy to obtain. Telephone companies and other data providers often sell information of this type. However, in order to be effective the directory should include additional information that is not available from standard information providers, such as business hours, reservations policies, payment options, and whether parking is available. Ideally, the directory would maintain this information in a structured format that supports complex queries such as “find restaurants open past midnight on Tuesdays” and “show restaurants with valet parking that take reservations.” Directories of this type have not been created due to the difficulties in gathering and representing the information.
Oftentimes, the information needed to build such a directory is available on the Internet. A restaurant might have its own web page that provides important details like its hours and reservations policy. Similarly, there might be one or more existing web directories that include entries for restaurants. Usually, though, this information is either unstructured or structured in an unsuitable manner. For example, the restaurant's web page might describe its business hours by using the phrase “closed Mondays” while the existing local directory specifies the same information as “Open: T W TH F S.” This variety of ways to express the same information makes it difficult to build a unified directory having structured information acquired from a variety of different sources.
Therefore, there is a need in the art for a way to build a structured, or at least partially structured, collection of information for a directory.