1. Field of the Invention
The present invention relates generally to the field of generating sales leads. In particular, embodiments of the present invention relate to a system and method for increasing the number of sales leads generated from part numbers of numerous types resident on various Web sites, computers servers, memory devices and/or other devices accessible via the Internet and the world wide web.
2. Description of the Related Art
There are more than a billion documents available on the World Wide Web from the list of hyperlinked data of (“Web”) over the Internet and this number continues to rapidly increase. These documents (“Web pages”) are stored as files on Web servers. Each of these Web pages has a unique Web address. These addresses are also called Uniform Resource Locators (URLs) or Universal Resource Locators (URLs). URLs are more fully explained in RFC 1738 “Uniform Resource Locators (URL) Berners-Lee, Masinter & McCahill.”
An Internet device, such as a computer using a Web browser, typically accesses a specific Web page by providing its unique Web address (e.g., a URL). That Web page is a file stored on a Web server. The file is simply downloaded without change to the requesting Internet device and displayed through a web browser that can read the source code. Every device accessing the file sees the same results. The stored file remains unchanged until an authorized user actively modifies the file. These types of Web pages are typically called “static.” A typical URL for a static Web page looks like this: “http://subdomain.domain.com/pagename.htm”. The “http://” is the value of the scheme field and it identifies the protocol scheme being used to transmit over the Internet. For the Web, the protocol scheme typically is HyperText Transfer Protocol (HTTP). The “subdomain.domain.com” is the value of the hostname field and it identifies the domain (or the Web server) that hosts the Web page addressed by the static URL. The actual format of this field depends upon the domain name conventions observed. Typically, the format includes a domain name and an extension (e.g., microsoft.com).
The “pagename” is the value of the path field and/or the file-name field. It may include a path to the specific Web page. It includes the file name of the specific Web page. The “.htm” or “.html” is the value of the file-extension field and it identifies the language of the file. In this example, the language of the static file is the most common format for a Web page: HyperText Markup Language (HTML).
Dynamic Web Pages and Dynamic Addressing
The opposite of a static Web page is a “dynamic” Web page. A dynamic Web page is one that is created the moment the page is accessed and it is usually created based upon data in a database. Unlike a static Web page, a dynamic Web page that a viewer sees is not stored intact on a Web Server. Instead, a dynamic Web page is generated anew each time it is accessed.
A dynamic Web page is generated based upon a stored file containing instructions and an associated database. Therefore, each instance of a generated dynamic Web page may be different from a previously generated page using the same address. There are many different implementations of dynamic Web pages. The implementation differs from each other in the set of instructions used in the stored file on the Web server and the type of database accessed. An example of such an implementation includes Active Server Pages (ASP) by the Microsoft Corporation. A typical URL for a dynamic Web page may look like this: “http://subdomain.domain.com/pagename.asp?parm1=val1&parm2=val2”. This example uses an ASP implementation. The protocol scheme, hostname, path, and filename fields are the same as those fields in the static URL. However, there are fields in a dynamic address that are different from fields in a static address.
The extension “.asp” is a value of a file-extension field and identifies the language of the dynamic-page-generation instructions. The extension “.asp” indicates that the page is formatted as an Active Server Page (ASP) and is generated by using the “asp” script engine on the server. The “?” symbol is a signal that the URL points to a dynamic page and it separates the portion of the dynamic URL referring to a specific file and the portion of the URL containing parameters.
The “parm1=” and “parm2=” elements identify the names of categorized parameter. The values of these parameters are used to generate the dynamic Web page. “val1” and “val2” are the values of the parameters. The values are typically used to access items in a database. A parameter consists of a parameter name and its associated value. There can be a series of many parameters. The “&” symbol separates each parameter from the other parameters.
Web Search Engines and Spiders
No central bibliographic authority exists to catalog the information found on the tens of millions of Web sites on the Internet. Generally, two basic approaches are available for finding the proverbial needle in this immense Web haystack: a subject directory or a search engine. Subject directories, such as “DMOZ” and “MSN”, MSN being a hybrid engine, catalog Web pages and organize them by subject. Each Web page is manually (or automatically) analyzed and categorized. Users can browse through the various categories and subcategories in the subject directories to find a Web site on a particular topic. Typically, Web pages are categorized and added to the directory by professional Web searchers or by user submissions.
A search engine provides a searchable database of indexed keywords. A search engine examines Web pages for specified keywords and returns a list of the Web pages where the keywords were found. Although search engines are general class of programs, the term is often used to specifically describe systems like “Google” “Yahoo” and “Live Search (MSN)” that enable users to search for Web pages on the Web.
A search engine includes two main parts: index searcher and index generator. An index searcher includes a database of indexing keywords of Web pages and logic for searching that database. An index generator includes a “spider” for gathering Web pages and an “indexer” for generating an index into those pages.
Typically, a search engine works by sending out the spider to fetch as many pages as possible. The indexer then reads these pages and creates an index based on the words contained in each page. Each search engine typically uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query.
Spiders are sometimes referred to as “Web-spiders”, “robots”, “Web wanderers”, “crawlers”, “Web-crawler”, etc.” These alternative names refer to programs that have the same basic functionality to visit Web sites by requesting documents from them.
A spider will “crawl” a Web page by following links found on the page. Normal Web browsers (e.g., “Internet Explorer”) are not spiders, because they are operated by humans, and don't automatically retrieve referenced documents. Provided with a page by a spider, an indexer parses the document and inserts selected keywords into the database with references back to the original location of the source page. How this is accomplished depends on the indexer. Some indexers index the titles of the Web pages or the first few paragraphs. Some parse the entire contents and index all words. Some parse the meta-tag or other special hidden tags.
Meta-tags are special HTML tags that provide information about a Web page. Unlike normal HTML tags, meta-tags do not affect how the page is displayed. Instead, they provide information such as who created the page, how often it is updated, what the page is about, and which keywords represent the page's content. Many search engines use this information when building their indices.
When visiting a Web site, most spiders will check a file called the “robots.txt” file. This file informs the spider whether the spider is authorized to search the site and if so authorized, which pages on the site to retrieve. Single-destination Web sites called “portals” are often a combination of a “subject directory” and a “search engine” (a hybrid). These portals include a search engine (with its spider and indexer) or are closely associated with a third-party search engine. These portals often include an organized and customized subject directory.
The Invisible Web is made up of information stored in Web databases. Unlike pages on the visible Web, information in databases is generally inaccessible to the spiders to compile search engines. Search engines typically index the Web by visiting Web pages and indexing their content. In particular, the spiders use the links found on pages to find new Web pages. The links include static URLs. Most spiders tend to ignore the content of a dynamic Web address and thus, the contents of the referenced dynamic Web page. These dynamic Web pages are often ignored because the format of their dynamic URL is typically different from the URL format of a static Web page. A dynamic URL includes parameters which can be recognized by the spider. Spiders are often specifically programmed to ignore dynamic addresses because of the complexity of navigating through dynamic pages and the inability to keep clean and accurate data. The information found in the databases of dynamic Web sites is typically not indexed by search engines. Therefore, these dynamic Web sites are not found by those using search engines to search the Web. This huge, unmapped region of the Internet is called the “Invisible Web.”
Accordingly, there is a need for systems and methods that entice Web crawlers to index Web sites containing large databases of inventoried part numbers, such as aviation part numbers, book ISBNs, automotive part numbers, electronic part numbers, phone numbers, zip codes, drivers license numbers, document numbers, etc., (which to a search engine is read as a basic keyword search) such that increased sales leads are generated for those Web sites and/or subscribers thereto and/or for authorized users thereof when a potential customer employs a search engine to search for one or more of the part numbers.