The Internet has experienced exponential growth and the number of interconnected computers is quickly approaching one billion worldwide. As such, the Internet provides unprecedented access to massive volumes of information and resources. An entity resource, such as a company, organization, periodical, etc., presents information to the Internet by uploading the information to a server that is connected to one of the interconnected networks and has a registered Internet Protocol (IP) address. Often, an entity organizes its information on the server as a hierarchy of pages composed with hypertext markup language (HTML). Along with general information, each page may contain links to other informative items including graphics, documents or even links to other web sites. Users can easily access an entity's information using a graphical software program referred to as a browser. Because the Internet is essentially a vast web of interconnected computers, databases, systems and networks, an entity's information is often referred to as its “website”. For this reason, the Internet and its interconnected web sites is often referred to as the World Wide Web. Finding relevant information on the Internet, including the millions of websites and the billions of individual web pages, is a difficult task that has been inadequately addressed.
Many companies have developed search engines in an attempt to ease the location and retrieval of information from the Internet. Examples of current search systems include the AltaVista™ search engine developed by Digital Equipment Corp., Lycos™, Infoseek™, Excite™ and Yahoo™. Most conventional search systems consist of two components. First, a data gathering component, known as a webcrawler or robot, systematically traverses the Internet and retrieves information from various websites. Often, the webcrawler moves from website to website traversing every link found. As the individual websites are accessed, each page of information is retrieved, analyzed and stored for subsequent searching and retrieval. After retrieving and examining each page of a website, the webcrawler moves on to another site on the Internet. While the webcrawler is traversing various websites and retrieving the pages of information, the webcrawler indexes the information presented by each page and stores a link to each page and the corresponding index information in a repository such as a database.
The second component of conventional search systems is the search engine. The search engine provides an interface for selecting the links stored in the repository in order to identify web pages with desired content. For example, the above mentioned search engines allow a user to enter various search criteria. The search engine probes the stored index information generated by the webcrawler according to the search criteria. The search controller presents to the user any stored links having corresponding index information that satisfies the entered search criteria. The user is able to view the actual page located on the original website by following the link to the actual website.