This specification relates to link based locale identification for domains and domain content.
A search service, e.g., a web-based search service, will generally receive a search query from a user through a user interface presented to the user by the service through a web browser on a personal computing device. Upon receiving a search query, a search service will generally direct the search query to a search engine for a specific corpus of resources. A search engine produces results based on the query. The search engine initially ranks the results according to one or more criteria including the relevance of the results to the query in the context of the corpus to which the query was directed.
A Uniform Resource Locator (URL) is a string of characters that identifies a resource (e.g., an addressable web document or file) on a computer network. A URL provides a means for locating a resource by describing the resource's location on the network. Each URL includes a hostname. A hostname is a unique name by which a network naming system identifies a particular device or group of devices that are attached to the network. Each hostname is associated with at least one Internet Protocol (IP) address.
Each hostname ends in a top level domain name. The top level domain name can be, for example, a generic top level domain name, e.g., “.com” or “.gov”. Alternatively, the top level domain name can be a country code top level domain (ccTLD) name, e.g., “.fr” or “.ca”, which identifies the country in which the name was registered. Hostnames also include a second level domain name immediately to the left of the top level domain name. The second level domain name can indicate a particular organization that is associated with the content on the domain. For example, the hostname “www.random.com” may indicate that the content is associated with an organization named Random, Inc. Hostnames having the same second level domain name but different top level domain names may be unrelated: for example, “www.random.be” and “www.random.com” may well be associated with distinct organizations.
Each resource identified in the results produced by the search engine may include one or more of the following attributes: a title of a webpage, a hyperlink to the webpage, a snippet of text showing search terms in bold, the size of the webpage, a hyperlink to similar web pages, and a hyperlink to a cached version of the webpage. After the search engine produces the results, the search service presents those search results to the user.
The identified resources correspond to one or more domains. In this specification, the term “domain” will be used to refer to the collection of Internet resources that are addressable through URLs sharing the same hostname. A domain may include a very large number of resources and Internet Protocol (IP) addresses, or it may include only a few resources and a single IP address. Under this definition, a domain will always be identified using its hostname: the hostname “www.random.com”, for example, will be used to indicate those resources addressable through that hostname.