1. Field
The present invention relates generally to machine learning and, more specifically, to detecting neighborhoods from text in geocoded web documents.
2. Description of the Related Art
Geographic information systems are generally used for a number of purposes, including to store, organize, and provide information about geographic areas. Examples of information stored in geographic information systems include names and geographic locations of geographically distributed entities, such as streets, points of interest, businesses, natural features, neighborhoods, cities, counties, states, provinces, countries, and the like. Geographic information systems are used in a variety of contexts, including identifying features to be displayed in interactive online maps, navigation systems, and identifying hierarchical relationships between entities (e.g., indicating which neighborhoods are in a city and which businesses are in a neighborhood). This information is used, for instance, by search engines to respond to a search query for a particular type of restaurant within a neighborhood named in the query, or by an interactive map to identify points of interest within an area being displayed.
Updating and populating records in geographic information systems is relatively expensive and difficult, particularly for systems describing larger areas, like an entire country in relatively fine detail, e.g., at the level of neighborhoods or local businesses. Over time, the names of neighborhoods change, and new neighborhoods are named. Often, among those living within an area, new names for geographic areas will emerge without an official body defining names or boundaries of those areas. For instance, a new name of a neighborhood may arise from the attributes of a relatively small area changing, such as a collection of similar businesses moving into an area, creating, for instance, a new restaurant district or fashion district. Or locals may develop colloquial names for areas by shortening formal names in unpredictable fashions. As new names arise, often frequently and describing a large number of relatively small neighborhoods, documenting the newly named areas can be difficult. At larger scales, such as spanning an entire country or the planet, manually cataloging new neighborhoods with human surveyors is generally prohibitively expensive due to the size of the area, frequency with which new neighborhood names arise, and number of languages in which areas are described.