The present invention relates generally to the field of computerized databases and, in particular, to a new and useful method of identifying collections of Internet web sites.
The Internet is a large network of interconnected computers. A particular computer or a file containing information on such a computer may be found through an "address." The address is a long combination of numbers; for example, the numeric address for a specific computer connected to the Internet might be 192.168.255.1.
The addresses identify computers containing files in specific information or interactive formats, such as Hypertext Machine Language ("HTML"). The information or interactive portions are combined to form what are now commonly referred to as web pages. Users have difficulty finding needed information on web pages for various reasons.
Aliases for the numeric address, called domain names, are usually easier to remember. The aliases often have intrinsic meaning which facilitates identification of a particular computer or world wide web site ("web site").
Presently, Internet domain names available for use by the general public in the United States include a second level domain and a top level domain ("TLD") which correspond to a numeric address and identify the physical location of a computer. Second level domains within the TLD's can currently be registered for use with a body known as ICAAN.
In the United States, five top level domains presently exist. The general public uses three of the five top level domains: .com, org and net. The remaining two top level domains are reserved for schools and government venues. The .com domain is the most popular. Other top level domains are available outside the United States. They have a two letter country code designator after the top level domain which reduces their popularity.
Second level domain names can include any combination of twenty-six characters, except for certain characters such as "i/" and ".", among others, which have special meanings. It is common for companies to utilize either their company name or an important trademark, or both. Many of these registrations are under the .com top level domain.
Second level domain names that use the .com TLD are popular for an additional reason. Internet users can use such second level domain names to bypass search engines, such as Yahoo, Excite, and Lycos, and directly access a company's web site by entering the business name as the second level domain followed by a .com extension. Internet users also use this shortcut to intuitively find companies or desired information by using company names or product names.
The Internet has become a tool for the sale of information, products and services. Many web sites offer commercial products for sale and provide the means for purchasing the desired goods by credit card with delivery by mail. It is estimated that over 200 million people will use the Internet to obtain information or goods during 1999. A business with no distribution network outside a single location can utilize the Internet and postal system to become a national company overnight. Thus, a generic or descriptive term, such as "wine," "clothes" or "tickets," as a second level domain, can be a valuable marketing mechanism reaching a larger market than is available at the marketer's physical location.
Nearly every two-, three-, and four-letter combination is presently registered as a second level domain. These combinations are popular for use because they are abbreviations of longer company names, like HP for Hewlett-Packard, or, in some cases, are actual company names, such as MCI. Similarly, many common English and foreign words and names have also been registered. Even with the advent of several new top level domains, the ".com" domain is likely to remain popular as the first choice of users finding Internet web sites by this shortcut method.
Despite their popularity, the short second level domain names do not usually provide any denotative reference "clues" about the content of a site. Although they can provide a connotation of a particular product or topic, the common word second level domain names do not provide descriptive information about the topic implied or popularly associated with their names. And, where many companies in different trades use similar terms for their names, a user cannot be certain which company can be found at a web site using the similar term. Wine.com, for example, may or may not have information about wine and may not relate to on-line sales of wine. United Airlines is not found at united.com.
Many articles have been written about the lengthy amount of time that people spend "surfing" the Internet. One reason for the length of time being spent is the inability of search engines to present lists of those web pages most relevant to the search. Search engines often return results filled with repetitive listing of the same web site, inactive web sites, or web sites containing information abstractly related to the desired topic.
To save time and effort, Internet users need a consistent method for searching organized listings of commercial web sites by topic such as product categories.
Codes in computer programs and computerized databases have been surrounded with non-letter characters or with a common character, such as in a comma-delimited database file.
U.S. Pat. No. 5,745,899, for example, discloses a search engine for Internet web pages that indexes words by pairing a word with a numeric location. Within the index database, words are separated with special characters not typically used in words, such as "@#., &lt;&gt;?!". These characters are used as word separators. They are not indexed in the database and are not paired. Other non-letter characters are used in pairs to set off attribute designations associated with a page. Non-letter characters such as a space or an underline are used on each end of the attribute name. The non-letter characters act as separators to indicate the presence of the named attribute. The attribute designations identify encoded word and location pairs located in specific sections of a page, such as the title or the end of the page. Since the database is encoded, the attribute information is not directly accessible by or visible to, a user entering the attribute name surrounded by non-letter characters. The attribute designators are used by the indexing program only to increase or decrease the weight given to search terms found associated with those attributes.
U.S. Pat. No. 5,797,008 discloses a similar indexing system. The patent contains the same information regarding the use of the non-letter characters as separators for data fields within the index database.
A database system which adds a division code to each end of string of data and a division code to each record in the string is disclosed by U.S. Pat. No. 5,870,750. The division codes are separators which indicate the breaks between individual data segments forming the database. The data between the separators does not otherwise identify a further collection of data.
U.S. Pat. No. 5,873,087 shows a method for using the greater than and less than symbols, "&gt;" and "&lt;" as brackets around data field labels. The labels are tags which identify specific information in each record. The symbols used in the labels are used only to designate the existence of the field. The labels only identify a single data item within each record.
Other patents disclose methods of identifying data in a hierarchal configuration relating data to different keywords, or identifiers.
U.S. Pat. No. 4,318,184 is for an information storage and retrieval system using a hierarchal tree system to classify and identify data. Keywords are used to identify particular data within each tree. Keywords can be combined to select a group of data for a particular item defined by the data. The keywords do not have any special identifiers connected to them.
U.S. Pat. No. 5,257,183 teaches an interactive, cross-referenced database system using a hierarchy of topics and subtopics to organize categories of information units. The identifiers for each of the topics, subtopics, and categories do not have any special form. Each information unit may have one or more qualifiers associated with the unit that can be used to select only information units having the desired qualifiers. As disclosed in the application, the qualifiers are single characters that are not part of or used directly with the information unit topic, subtopic, and category identifiers, but instead, the qualifiers are stored as part of the database.
In each of the prior art systems, the code characters are used only to separate words or data within a database and indicate the presence of a new word or data. The code characters are not used by a person interacting with the database to obtain information. Further, while it is common to use keywords to index information in databases, none of the prior art requires the user to enter additional special characters to obtain the information linked to the keyword.