The World Wide Web (or “Web”) contains a vast amount of information in the form of hyperlinked documents (e.g., web pages) loosely-organized and accessed through a data communication network (or “Internet”). Diverse computer networks use a communication protocol to coordinate the exchange of information. For example, access to the Internet sometimes uses the Transmission Control Protocol/Internet Protocol (TCP/IP) with a client-server model of computer hierarchy. The server provides information commonly presented in the form of viewable web pages, and the client being a computer retrieving the information (i.e., selecting for display desired web pages). A hierarchical collection of related web pages is commonly referred to as a web site. Web pages may contain electronic documents, images, sounds, video, etc.
One of the reasons for the virtually explosive growth in the number of hyperlinked documents on the Web is that just about anyone can upload hyperlinked documents and other information, organized in any number of different structures. A vast majority of the information includes hyperlink “shortcuts” to other information located in other hyperlinked documents. The unstructured nature and sheer volume of data available via the Internet makes it difficult to navigate efficiently through related information while avoiding unrelated information. A user often uses a computerized search engine to sort through the large quantity of information accessible via the data network.
A search engine attempts to return relevant information in response to a request from the user. This request usually comes in the form of a query (e.g., a set of words that are related to a desired topic). A common way of searching the Web is to find web pages containing all or many of the words included in the query; such a method is typically referred to as text-based searching. Search engines typically respond to such a query by returning a display of links associated with web pages and a brief description of the content provided by the web pages. Because the number of pages on the Web is typically very large, ensuring that the returned pages are the most relevant to the topic sought by the user is a central problem in Web searching.
While the Web platform is an invaluable research tool, one should not overlook the usefulness of more conventionally-available media such as printed media, CDs, DVDs, audio books, and the like. Significant time-sensitive information is still published and disseminated in these more conventional forms. Printed material, for example, includes special editions on recent important events and periodicals such as magazines, newspapers, and journals. Information that was generated before widespread use of the Web is often available only in printed media form and, although indexes of hard-copy-printed materials are increasingly available for searching by computer methods, the printed material is frequently not directly available for viewing through the Web. Conventionally, searching printed media via the Web involves directing a search engine to find web sites having printed document indices, and subsequently searching the index within the web site for relevant printed materials using another dedicated, intra-web-site search engine.
Frequently, the printed media itself is not viewable through the data network, particularly if the search result is a book or magazine published and sold in hard copy for profit. Therefore, search results for an intra-web-site search engine typically do not include further hyperlinks to the actual printed media, but rather a citation to the hard copy document. To determine relevance for printed media cited by a web-based search, a researcher is often required to physically retrieve and review a printed hard copy from a depository, such as a library.
As the pool of researchable media continues to increase, so does the need for more efficient searching and viewing tools.