1. Technical Field
The present disclosure relates to a system and method of using machine learning to automatically discover the home page of an entity (e.g., a company, an organization, or a person) based on a specified descriptor such as the name of the entity.
2. Discussion of Related Art
It can be a challenge to discern whether a company has a website and to identify the correct home page if the company has a website. A URL (Uniform Resource Locator) is the unique address for a file that is accessible on the Internet. The term “website” (alternatively, web site or Web site) refers to a related collection of World Wide Web (WWW) files that includes a main file called a home page.
There are instances where it is useful to know the Internet home page or URL for a specified set of companies. If the companies are large, such as Fortune 500 companies, the task of finding their home pages can be accomplished by submitting each company name to an Internet search engine and selecting from returned results. This approach is less reliable for smaller companies because the returned results will not reliably include the home page address. The smaller a company's Internet presence, the more difficult it becomes to identify the home page.
Registration of Internet top-level domains is managed by the Internet Corporation for Assigned Names and Numbers (ICANN). A top-level domain (TLD), sometimes referred to as a top-level domain name (TLDN), can be registered through domain-name registrars that have been accredited by ICANN. A number of companies have been accredited by ICANN to act as registrars in one or more TLDs, including, for example, .biz, .com, .info, .net and .org.
It is known that TLD registration lists can be used to determine the home page address for a given company. Using TLD registration lists, it is possible to determine if a specified domain is currently registered, and if so, the name of the entity that registered the domain.
However, the conventional method of domain lookup can lead to incorrect results for companies with a small Internet presence. Many of these companies rely on other companies to build, host, and maintain their company websites. The company that develops the website may register the domain under their company name, rather than the name of the requesting company. For this reason, the use of domain registration data does not reliably determine a correct match of a company name to its website. For example, the company Michigan Capital Finance has a home page associated with a given domain name. If this domain name is matched to a domain registration list (there are websites that support such a query), the named registrant is an entity ZWBALLCO, which is a different company that offers website hosting services to other companies. Hence, domain lookup cannot be relied on to lead to correct results.
Therefore, a need exists for a system and method of matching an Internet home page to a specified entity.