Electronic documents typically include information content, such as text, tables, equations, graphics, images or other objects such as video or audio clips, and layout or formatting information that defines the visual appearance of the document—that is, how the content is to be arranged and represented for display. Thus, for example, a page of an electronic document can include text arranged in two columns taking up a top portion of the page and an image that occupies the bottom portion of the page. Electronic documents can be generated and viewed using any of a variety of word processing programs such as Microsoft® Word, page layout programs such as Adobe PageMaker®, or other content management programs such as Adobe Acrobat® or web browsers such as Microsoft® Internet Explorer.
Electronic documents can include a collection of one or more pages, which can represent bounded subunits of content (e.g., a page in a typical word processing document sized to be fit on particular type of paper) or unbounded subunits of content (e.g., a web page that can contain essentially any amount of content). Electronic documents, or pages within such documents, can include references or links to other locations in a page or document, or to other pages or documents, which may be available locally or at remote locations on a network, such as web pages located on the World Wide Web.
Links can be associated with any item displayed and selectable within a document, such as text, graphics or images. Electronically, links are often displayed in some distinguishing manner such as by underlining text with which the link is associated. A reader viewing a document that includes links can jump to a linked document or a different part of the same document by activating the text or a hot spot associated with a link. Links can be activated through a graphical user interface by, for example, clicking on the associated text or hot spot using a mouse or other pointing device, or through keyboard commands in a textual interface, or potentially using voice commands.
Links can be implemented as hyperlinks embedded within a document or page. A hyperlink can be implemented, for example, as HTML code that contains the address, such as the URL (uniform resource locator), for another document, such as a web page, or another location within the same web page. A hyperlink can be embedded in a text string, an image, or a portion of an image (commonly called a hot spot). Typically a web page contains a plurality of hyperlinks and a user navigates from one web page to another linked web page by activating the desired hyperlink.
A web site is a collection of web pages that are managed by or for a single entity. Typically, the web site has a start page or index page which contains links to a first level of web pages, the first level of web pages containing links to a second level of web pages, and so forth.
Content management programs, such as Adobe Acrobat®, allow users to capture or download web pages for viewing at a later time. For example, a commuter can use such a program to download articles of an online newspaper to a laptop before leaving home and later read the latest news while sitting on a train with no web connection.
A web site typically contains a wide variety of content. For example, an online newspaper may contain not only news content, but advertising content as well. Even within the news content, there can be a wide range of subjects and the user may only be interested in some of the subjects. Unfortunately, conventional programs provide limited options for selecting what content to capture. The user can explicitly choose which links to capture by manually clicking on those links while viewing a page in the program display. However, this option is tedious and time-consuming. Alternatively, some programs permit the user to specify that all links from a given page should be followed down to a specified level. While this relieves the user from manually selecting which links to follow, it can require substantial memory and processing time to capture multiple levels. It can also result in the capture of large numbers of unwanted pages. Finally, in some programs, the user can specify a specific server or path and the program will only follow links corresponding to the specified server/path. However, the users may not always want pages that share the same server or path or may find that the server or path is too restrictive.