For Internet users and businesses alike, the Internet continues to be increasingly valuable. Individuals and businesses depend on their online presences, particularly their websites, to deliver current and useful information to customers, readers, and other Internet users. Websites are made available to visitors online via domain names that the visitors type into Internet browsing software. A domain name is comprised of at least two labels each separated by a period and including a top-level domain (TLD) as the rightmost label, with a second-level domain (SLD) to the left of the TLD and further subordinate levels, called subdomains, extending to the left. Thus, an example domain name is “blog.example.com,” where “corn” is the TLD, “example” is the SLD, and “blog” is a subdomain.
A domain name is unique: there can be only one instance of a particular combination of SLD and TLD registered for use on the internet. An SLD can be a string of up to 63 characters containing any combination of letters, numbers, and dashes. The SLD is typically a word or combination of words with or without dashes separating the words. The composition of a TLD, on the other hand, is restricted; there are a finite number of TLDs, which number is currently growing. TLDs are divided into country-code TLDs (ccTLDs), which are two-letter TLDs designating a specific country, and generic TLDs (gTLDs), which contain three or more letters. The foundational gTLDs.com, .net, and .org were the only gTLDs available to businesses and individuals until about the year 2000, and are still the most commonly used gTLDs. Since 2000, and particularly since 2012, many more gTLDs have become available and include abbreviations (.biz, .info) as well as words up to eight letters in length (.shop, .arts, .clothing). However, .com and other older gTLDs remain the most sought-after due to familiarity, solid registry infrastructure, and other reasons.
The exhaustibility of domain names has given rise to a domain name aftermarket where registered domain names, or those with expired registrations, are bought and sold, often for high sums. As an illustration, WIKIPEDIA maintains a list of the most expensive domain names (of published sale prices). The top entry on the list is currently Insurance.com, sold for $35.6 million in 2010; the 26th entry on the list is currently Whisky.com, sold for $3.1 million in 2013. The domain name aftermarket includes domain name auctions as well as set pricing. Additionally, some registrars maintain “premium” pricing for certain available domain names, based on metrics such as age and use (i.e., traffic) of the domain name and popularity of terms within the domain name. Premium prices can range from hundreds to thousands of dollars per year. Yet, businesses and individuals will pay the premium price in order to serve their web presences from those valuable properties.
Domain name service providers, such as registrars and website hosting providers, facilitate a user's identification and registration of a domain name via a domain search system. The system includes a user interface in which the user enters her desired domain name or search terms, and a back-end server or network of servers that processes the user input to determine if the domain name is available. The domain search system can further generate suggestions, referred to herein as “candidate domain names,” that are similar to the input domain name or search terms. This gives the user flexibility in case the exact desired domain name is unavailable or too expensive, or in case the user does not know exactly which terms she wants included in the SLD, or which TLD to choose. The user may also want to register multiple similar domain names to capture additional traffic or prevent others from using too similar a domain name. The process of generating candidate domain names is known in the art as “spinning.” Typically, spinning begins with identifying known words, or “tokens,” within the domain search input. Then, variations on the word combinations are generated using one or several techniques, including without limitation rearranging tokens, pluralizing tokens, concatenating characters, truncating or abbreviating words, and finding semantically similar words such as synonyms and spelling variants. Several algorithms exist for ranking the resulting candidate domain names according to one or more metrics that indicate relevance to the domain search terms or to the user.
The selected domain name is likely to become valuable to the registrant as she develops her web presence or simply holds the registration with the intent to resell it. The speed of the domain search and the quality of candidate domain names factor significantly into the user's ability to secure the most valuable domain name(s). One problem that impacts the speed of a domain search is the complexity involved in spinning candidate domain names. It would be advantageous to minimize the amount of time the system needs to identify candidate domain names and confirm they are available for registration.
One problem that impacts the quality of candidate domain names is the fact that the candidates may be records in disparate domain data sources. For example, the system may spin 100 candidate domain names from the search terms, and may also identify another 100 candidates in the domain aftermarket index. A solution is needed that allows the system to rank the candidate domain names from these disparate sources in a single list. Current domain spinning algorithms employ rule-based “blending” of disparately sourced results. A system blends the disparate sets of search results by applying rules that are essentially quotas. For example, a system that has access to an aftermarket index and a ccTLD index and also spins a set of candidate domain names in real time may form a set of top ten candidate domain names by selecting the four highest ranking spun domain names, the three highest ranking candidate domain names from the aftermarket index, and the three highest ranking candidate domain names from the ccTLD index. This solution is not truly ranking the candidate domain names across sets, and may result in exclusion of more valuable or relevant candidate domain names in order to meet the rules.