1. Field of the Invention
The invention disclosed broadly relates to the field of computer networks, and more particularly relates to the field of search methods for the World-Wide Web (WWW or simply, the Web).
2. Description of the Related Art
The Internet is a global network of computers and computer networks that all linked communicate by virtue of the Internet Protocol (IP). The IP is a packet-switched communications protocol. In such protocols the information to be transmitted is broken up into a series of packets (i.e., sets of data) that are encapsulated in a sort of electronic envelope (the packet) including a portion called a header that includes fields for identifying the source of the transmission, the destination, and other information about the data to be delivered to the destination (called the payload). A popular application for the Internet is to access the Web which uses a protocol called HTTP (HyperText Transfer Protocol) by client units for connecting to servers in the Web. A client unit (e.g., a microcomputer unit with a communication subsystem connected to the Internet) invokes the HTTP by simply typing a xe2x80x9chttp://xe2x80x9d prefix with the desired Web address. Once the connection is made to the desired Web site, the user (or client) can access any document stored on that site that is available to that user. The interface used by the client is an application program called a Web browser (the Netscape and Explorer browsers are popular examples). The browser establishes hypertext links to the subject server, enabling the user to view graphical and textual representations of information provided by the server.
The Web relies on an interpretative scripting language called HTML (HyperText Mark Up Language) which with Web-compliant browsers are capable of rendering text, graphics, images, audio, real-time video, etc. HTML is independent of client operating systems. So HTML renders the same content across a wide variety of software and hardware operating platforms. Software platforms include Windows 3.1, Windows NT, Apple""s Copeland and Macintosh, and IBM""s AIX and OS/2, HP Unix, etc. Popular compliant Web-Browsers include Microsoft""s Internet Explorer, Netscape Navigator, Lynx and Mosaic. The browser interprets links to files, images, sound clips, etc. through the use of hypertext links. Upon user invocation of a hypertext link to a Web page, the browser initiates a network request to receive the desired Web page.
Users of the Internet are faced with an ever-increasing number of sites, each containing varied information. This results in difficulty finding the desired information. Among commonly used tools for locating information are the so-called search engines or portals to the Internet. These sites provide various indexes to other sites. Search engines use crawlers or spiders, programs having their own sets of rules, to index pages on the Web. Some of these follow every link on every page they find. Others ignore some kinds of links.
A common problem with the general Internet search is that often too many result pages are returned and many of these have low relevance to the search request issued by the end-user. Typically, the search engines used in corporate sites are not as powerful as the Internet search engines and typically provide less information than is desirable.
Finding information on the Internet, or on corporate intranets, can be a daunting task. Even targeted searches frequently result in hundreds or thousands of hits. Many producers of Web pages intentionally use techniques to cause their pages to be displayed as a result of searches which are not really pertinent. This results in too much information, much of it not useful. In addition, many Web domains have other links buried within their pages, and restricting a search to a specific Web domain results in ignoring information contained in these links. This results in too little information. Thus, there is a need for a search process producing more directly useable results.
Corporate sites frequently employ a search engine to allow users to search their corporate pages. These search engines are often less effective than desirable or lack advanced features of more generic search engines. At times, end users desire information which is in related sites, perhaps business partners, etc., which is not contained within the corporate pages and which will not be displayed as a result of the corporate page search. Some search engines, such as Hotbot, allow a user to specify a domain, but do not then search the related sites.
Accordingly, there is a need for a system for searching the Internet that limits the search results and which overcomes the above problems and produces more directly useful search results.
Briefly, according to the invention, a method for searching for data in a data network comprising hyperlinked pages comprising the steps of (1) receiving an initial set of network addresses for pages in the data network; (2) receiving a non-negative integer, N, specifying a chain length; (3) receiving a set of at least one search argument comprising search criteria; and (4) performing a search wherein all pages linked to said initial set of addresses by a chain of distance less than or equal to N are examined for compliance with the search criteria, and all pages meeting such criteria are returned as successful objects of the search.
According to optional embodiments the foregoing method can be implemented as a computer readable medium with instructions for performing the above steps, as an application program, or a browser resident at an end user""s computer system. It is also possible to implement as a special purpose information handling system.