The invention generally relates to searching a network for text and non-text data, and providing for storing and forwarding search results. More particularly, the invention relates to incorporating search capabilities provided by private or special purpose network resources into a general searching mechanism.
In the discussion below and claims that follow, an artificial distinction is made between traditional (or xe2x80x9cpublicxe2x80x9d) search resources such as AltaVista-com, Excite-com, NorthernLight-com, Yahoo-com, and meta-search businesses such as SurfWax-com, Go2net-com, Dogpile-com, and the like, and xe2x80x9cprivatexe2x80x9d or xe2x80x9cspecial purposexe2x80x9d search resources provided by individual corporate web pages, university sites, government sites (e.g., IBM-com, PCConnection-com, FindLaw-com, USPTO-gov, Harvard-edu), and the like. The former are businesses in the business of helping searches locate information (e.g., on the Internet or intranet), and are referenced hereafter as xe2x80x9cpublic search resources.xe2x80x9d The latter, even if providing extensive search abilities, are not in the searching business, and are referenced hereafter as xe2x80x9cspecial purpose search resources. However, it will be appreciated that since the distinction is somewhat artificial, the techniques disclosed below for manipulating special purpose search resources are also applicable to public search resources. (Please note that periods within uniform resource locators (URLs) have been replaced with hyphens to prevent hypertext links in an online copy of this application.)
Recently there has been a vast proliferation of networking connection options, for business and general users alike, for connecting to networks such as intranets and the Internet. Many such businesses and users position themselves as an end point, or point of interest (hereafter generally xe2x80x9cweb sitesxe2x80x9d), to whom others can connect and obtain information and other material. After several years of such end points becoming accessible of the networks, an enormous amount of information and other material is now available in an online electronic format.
A typical method for locating and reviewing such information is by way of a xe2x80x9cweb browserxe2x80x9d, such as Netscape Navigator, Internet Explorer, Opera, and other network application programs (hereafter generally xe2x80x9cbrowsersxe2x80x9d). Unfortunately, the very richness of available information has made finding anything specific an enormously complex and tedious task.
Typical search methods employ either data categorization or keyword searching. In the former, a well known example is the public search resource www-Yahoo-com, which provides broad categories and successively narrower topic areas. In the latter, there are typically two types. The first are traditional search engines such as NorthernLight-com, AltaVista-com, Excite-com, and the like, which xe2x80x9ccrawlxe2x80x9d web sites and index the words found therein. The second are xe2x80x9cmetaxe2x80x9d search engines, such as SurfWax-com, DogPile-com, and the like, which execute a search across multiple search engines, and provide options for collating results.
Unfortunately, both categorization and keyword searching have significant drawbacks. Categorization requires intervention to place in site within a relevant category or categories. Such categorization is subjective, and therefore may result in significant omissions or misleading results when a searcher drills down to detailed categories. And, categorization is resource intensive, and therefore few web sites are categorized. Typically, only xe2x80x9cmain streamxe2x80x9d (e.g., popular) sites are categorized.
Although keyword searching does not suffer the subjective effects of categorization, such searching is based only on content identified from crawling and indexing a particular location at a web site; consequently, such content is likely to be stale and/or incomplete. Frequently, network sites, such as file servers, Internet web pages, etc., are subject to frequent changes and reorganizations of data storage that renders crawled results inaccurate. Frequently, much of a site""s content is only indirectly accessible through interaction with the site, hence crawling will not find this content. Related to this, some sites provide content through dynamically generated web pages; such pages, will not exist for long after indexing. An additional problem is that crawled sources are blindly indexed without regard to origination language; thus search terms popular in a foreign language often retrieve irrelevant or incomprehensible results.
Thus, a better technique is needed for reliably indexing and retrieving data from extent search sources.