The Internet has become the world's information retrieval system. One of the distinguishing features of Internet (and intranet) documents is the use of embedded document links. Such a link is a portion of a source document that links to a target document: another document, or a different section of the same document. The other document may be on any computer system on a network supporting the appropriate communication protocols. Selecting a link navigates from the source document to the target document.
A web site is a collection of linked documents accessible through the World Wide Web, a part of the Internet. Such documents are commonly called web pages. Typically a web site has a “home page” that is the entry document into the site. The World Wide Web is commonly referred to as “the web”.
Web pages commonly use a description language such as HTML (hypertext markup language) or XML (extensible markup language) to embed links and provide document formatting.
A link on a web page is by convention expressed as a uniform resource locator (URL). A link is often associated with a word or phrase in a source document, hence the common nomenclature: hypertext link. But a link may also be associated with images, or controls such as buttons, menus, and the like.
A web browser is a program for displaying web pages. Examples of popular web browsers include Microsoft Internet Explorer and Netscape Navigator.
Web browsers allow users to create and maintain directories of web page links. Such directories are commonly represented as folders or, sometimes, tabs.
New web pages or web sites are commonly found by links in known documents, or by keyword search. Users typically topically group links to related documents in self-titled directories, the directory title being the common topic of links within it.
Web sites are often extensive enough (so many pages) that a site typically offers a search facility for the site; commercial web sites almost always offer site search. Search refers to inquiry based upon one or more keywords (search terms). Search engines that search a multitude of sites abound on the web. A good search engine provides a commercial advantage. Some search engines, and some commercial products, such as Copernic® from Copernic Technologies, tap into multiple search engines to conglomerate searches.
Based upon keywords, quality search engines glean the most probably related pages using a confluence of linguistic analysis methods. Word location analysis is based upon the assumption that the topic of a document is specified in the title, headings, or the early paragraphs of text. Word frequency analysis counts the number of times search terms appear in a document. Syntactic analysis processes the grammatical structure of a document, serving to indicate nouns and verbs. Semantic analysis interprets the contextual meaning of words by examining word relationships. Morphological analysis reduces verbs and nouns to their base form, providing a basis for direct word matching. At least one commercial product, LinguistX® from Inxight Software, provides advanced natural language text analysis.
In spite of software sophistication, as every experienced web user knows, user-initiated keyword search can be vexing: searches commonly return a plethora of pages, many unrelated to the desired topic. Search for ‘watch’, for example, thinking time pieces, and you'll likely end up with a bushel of pages about voyeurism. Careful application of search terms yields more relevant links, but the process and results are problematic: beyond searching for “this ‘and’ that”, search Boolean logic is not exactly intuitive; different search engines have different syntaxes for search Boolean logic, and different ways to apply it, making that bit of business even less amenable; a bit of search pruning still leaves an abundance of junk, while a search result leaving out the chaff probably leaves out a good bit of wheat too.
The technology of document linking, search, and software-based linguistic analysis are well established. Recent advances enhance utility in locating desired information. For example, the subject of U.S. Pat. No. 6,122,647 is dynamically linguistically analyzing the text of a user-selected portion of a target document and generating new links to related documents. The subject of U.S. Pat. No. 6,184,886 is allowing a user to generate and maintain a list of prioritized bookmarks (links) that allow later access to selected sites (documents). The subject of U.S. Pat. No. 6,182,133 is pre-fetching pages for later viewing, thus saving a user time retrieving documents.