Almost all current Web-based technologies assume that the basic unit of Web information is a “Web page”. Web browsers (for example Netscape Navigator and Internet Explorer) provide facilities for displaying, printing, and saving individual Web pages. Web search engines (for example Yahoo and Google) maintain indexes and can provide links to those Web pages. This bias towards providing content as largely static Web pages favors the original authors and publishers of the Web content, who are free to determine exactly what constitutes a Web page. However, the actual users (i.e. the readers) of the Web page are often interested in only a portion of a Web page (for example, a portion that includes the desired information but excludes any unwanted advertisements). Additionally, for many tasks, a user may be interested in comparing information between different portions of different Web pages. For example, when shopping online for a product or service, the information required to make a truly informed purchase may be distributed over several different pages, and at several different Web sites. A prospective purchaser may like to compare this information in a convenient manner, before making a purchase.
Many current Web browsers allow users to save links (typically referred to as Uniform Resource Locators or URLs) to selected Web pages in the form of a “bookmarks” or “favorites” file. Although the standard URL specification supports the use of named anchors (pointers that indicate an offset into a particular Web page), these anchors are determined by the Web page author when he/she creates the Web page. Traditional URLs or anchors cannot indicate which portion of the Web page a user might actually be interested in.
As an alternative, users may take notes by highlighting, copying, and pasting portions of the Web page text into a separate application or text editor. However, textual copying alone does not capture the visual context of the Web page. There are many instances in which preserving the manner in which the Web page is rendered may be critically important. For example, the colors, fonts, point sizes, column widths, graphical layout, image sizes, and word spacing may be important to preserve for historical or legal reasons. Graphic designers may be more interested in the graphical renderings of Web pages than in the actual content. For some pages with similar content, such as newspaper sites carrying major news stories, the graphical layout may be the expected way to distinguish brands. In addition, a rendered image may be a better way to store a document clip for human interaction. The visual context of a Web page layout may provide contextual cues for helping users to remember why they made a note in the first place. All of these factors suggest that a system that uses images of rendered documents is more generally useful than one that does not.
The above discussion largely describes Web pages as one form of document, however the techniques used for Web pages can also be used for other technologies. In general, many digital applications can be thought of as managing both a rendered document and its underlying structure. A spreadsheet program, for example, maintains an internal representation of the spreadsheet, while simultaneously supporting a user interface that lets people (users) view and edit the spreadsheet contents or data. People read, print, and interact with the rendered view of the spreadsheet, while the system translates selections and other interactions with the rendered representation into operations on the internal structure.
Some Web-based technologies allow users to take notes that preserve some of the graphical attributes of rendered Web pages, by copying a portion of the underlying Web page's HTML code. For example, a portion of a Web page may be highlighted in a Web browser application and then pasted into a Web page editor application such as Microsoft Word or Microsoft FrontPage. Portions of multiple pages can be similarly copied into the editor and the resulting Web page can then be saved on a Web server and shared with multiple users over the Web. However, this note-taking method does not preserve the actual rendered layout of the original source pages. The method is also inconvenient for users because it requires them to engage in a process of Web page authoring, (for example, care must be taken to copy and paste important portions of the underlying HTML code, such as the URLs associated with each source page and CSS styles and JavaScript functions that may be required to make the resulting HTML fragments render correctly), when ideally they should be allowed to focus simply on the task of reading and understanding the Web information.
In addition to the editor-style process described above, a number of technologies exist to allow users to clip and reuse just a portion of a Web page. Screen capture programs such as the SnagIt application allow for capturing a portion of a rendered document exactly as it appears on a user's display. Such screen capture programs, however, merely capture an image and have no ability to capture the underlying structure of any displayed document.
Notetaking and annotation systems, such as Microsoft's OneNote, NetSnippets, and the Xerox XLibris system, allow users to copy, save, and organize portions of documents, and to publish collections of “notes” or “snippets” as a Web page. These note taking systems are much like the “what you see is what you get” (WYSIWYG) HTML editors, such as Netscape's Composer, which allow a user to create a new Web page out of portions of existing Web pages. However, none of these systems provide any ability to simultaneously store images of a portion of the rendered source document, together with the underlying structure, which as described above may be absolutely critical in certain applications.
What is needed is a means by which portions or fragments of online content may be clipped for repurposing, augmenting, and reassembling to create new or modified documents. As used herein the term “repurposing” includes packaging the online information in a manner so that it can be re-used by subsequent users for subsequent applications. Additionally what is needed is a means for storing both the fragment of the underlying document structure and the image of the rendered document. Furthermore, the means for controlling such a system should be easily accessible to the user. The augmented or reassembled documents should be able to be readily made available in a collaborative environment, for subsequent review and re-clipping by the original and other users.