The present invention relates to the field of digital content extraction (e.g., cut, copy, paste, etc.) from electronic documents and, more particularly, to automated and user customizable content retrieval from a collection of linked documents to a single target document (e.g., fragmented document).
There are an increasingly large number of environments and formats that digitally encoded information can be stored in. For instance, information can be in the form of a Web based document in a database accessible through a Web site. In particular, Web-based information is becoming increasingly fragmented as data is often retrieved from disparate locations. Even if located in a single data store, a set or collection of discrete electronic documents, referred to herein as a fragmented document, are often used to represent one unified concept. That is, what from a user perspective is a single document can actually be a set of two or more different electronic documents, which are linked (e.g., hyperlinks) to one another.
Content of a fragmented document is frequently dispersed and/or separated into paragraphs, sections, and titles, and the like, each of which may correspond to a different electronic document. For instance, it is not uncommon for portions of important information to be presented via a links (e.g., URLs) without displaying the actual information in an initial document. Thus, the user must access the content via the presented link by invoking a navigation action. This approach has advantages in a distributed computing and/or Web context, as only portions of fragmented documents are needed to be conveyed at a time, which decreases delivery and load time and is bandwidth conservative. One inherent negative, however, is that users frequently need to perform numerous navigation actions to acquire the fragmented document or a substantial portion thereof.
This repeated navigation can be particularly frustrating when a user wishes to copy, print, or otherwise output a user-desired portion of a fragmented document. For example, a user wanting to copy several non-contiguous sections of a fragmented document would have to navigate around the links of the fragmented document. During this navigation process, a user has to select desired sections, some of which may be longer than a single screen full of information. Then a user will cut desired content and paste it to a target document. This process can be repeated until desired content from the fragmented document is included in the target document, where it can be handled in a unified manner (e.g., saved, printed, etc.). The aforementioned process makes the task of copying portions (or all) of a fragmented document a time consuming and error prone endeavor.
One existing solution for obtaining all information from a Web-based fragmented document is to use some form of Web-crawler that follows links and downloads all sub-information. While some of these pre-fetching solutions can be crudely tailored to follow links of a certain type (e.g., images, video, etc.), ultimately content of the fragmented document is pre-fetched in a format it was in when presented within a browser. In other words, no convenient means for pulling specific sub-parts of a hierarchical structure of a fragmented document to a flat structure of a single electronic document exists.