In these times, documents typically exist in two forms: hard or soft. These forms may also be called hardcopy or softcopy; physical or electronic; molecules or electrons; analog or digital; paper and electronically stored; and the like. Herein, for the purposes of clarity, these forms are called “paper” documents and “digital” documents, respectively.
Generally, the so-called “paper” documents are visible, physical, permanent media having visible, physical, permanent markings (i.e., indicia). Such permanent media is not limited to paper, but can include other media that serves the same general purpose. For example, other such media may include film, transparencies, and the like. The markings typically include some form of content (e.g., data or information), which is persisted at the direction of a person or machine.
Generally, the so-called “digital” documents are electronic representations presented on a computer display screen. Such representations are stored on or transmitted via computer-readable media (e.g., diskette, hard drive, wire, etc.).
Often, content in one form is converted to another form. Digital documents may be converted to paper documents by printing on a printer (e.g., printouts). The typical goal of word processing and desktop publishing applications is to produce high-quality paper versions of the digital versions of a document.
Conversely, the content of paper documents may be input into a computer to generate digital documents. Data may be manually entered. A photograph may be scanned. An article may be scanned and processed by an OCR (optical character recognition) to pull text back into a digital document so that it is manipulable again.
Format of Digital Documents
Generally, the format of digital documents depends upon the intended purpose of such document and/or the source of the content in the document. Examples of generic formats of digital documents include character-based and image-based.
Character-Based
A character-based digital document (or simply character-based document) is one where the primary addressable data object is a character (e.g., letter, symbol, punctuation, etc.). Typically, these character-based documents include some control codes and formatting codes. However, the fundamental manipulable and addressable object is a character.
For example, a word processor primarily generates digital documents with character-based data. The format of this data is highly readable and manipulable by humans. A human can manipulate each character in such a document by using a word processor.
These characters are typically encoded. An example of such encoding is ASCII (American Standard Code for Information Interchange). It is a standard code for representing English characters as numbers.
Image-based
An image-based digital document (or simply image-based document) is one where the primary addressable data object is something other than a character. Two common varieties of image-based digital documents include “raster-oriented” and “vector-oriented.”
Raster-Oriented. A raster-oriented image-based document may consist of a grid (e.g., a raster) of values. This may also be called a “raster,” a “bitmap-oriented,” or a “bitmap” image-based document. The fundamental manipulable and addressable object is a pixel on the raster to represent images. A pixel may also be called a point, a dot, an intersection, or a bit.
With a bitmap, an image is composed of a pattern of dots. Examples of common document formats that are raster-oriented include: BMP, GIF, PCX, and TIFF.
Vector-Oriented. A vector-oriented image-based document may consist of a set of “drawing” instructions. This may also be called a “vector” or an “object-oriented” image-based document. The fundamental manipulable and addressable object is drawing instructions (including geometrical formulas) to represent images.
Examples of common document formats that are vector-oriented include: CGM, DXF, EPS, and WMF.
Fixed Digital Documents
By their nature, the content of character-based digital documents is largely textual. Likewise, the content of image-based digital documents is largely graphical. However, there is a significant and growing segment of the body of image-based digital documents wherein the content is largely textual. These documents are image-based digital documents caught in an intermediate stage of conversion from/to paper documents to/from character-based digital documents.
Herein, these documents are called “fixed” digital documents (or simply fixed documents). The “fixed” terminology refers to the immutable nature of the visible characters at a character-addressable level. In other words, the content of a fixed document—in particular, the characters and words—cannot be simply modified using a character-based application (such as a word processor). To modify a fixed document, it is typically converted to character-based data (using technology like OCR). In addition, a fixed document may be immutable for non-technical reasons (e.g., legal reasons).
Transition from Character-Based Digital Documents to Paper
Why would one want for character-based documents to be in a fixed form, but not on paper? This is desirable when one wants some of the characteristics of publishing on paper to be part of an electronic document. Specifically, such characteristics include consistency and immutability. Typically, these types of documents are vector documents.
Typically, fixed documents print in the same manner on all output devices (e.g., printers). With character-based documents, a printout can and does vary depending upon the output devices (e.g., printers) and the computers involved.
Typically, fixed documents are unchangeable (i.e., immutable). Although security may be employed to prevent modification, the unchangeable nature of fixed documents is focused, herein, on the ease of change rather prevention of change. Generally, the content of a fixed document is not easily altered using a character-addressable application (such as a word processor or desktop publishing application).
Common examples of formats of fixed documents that are likely in this transition (from character-based document to paper document) include: Portable Document Format (PDF) and PostScript™.
PDF is a popular standard format for electronic document distribution worldwide. PDF is a near universal file format that preserves all of the fonts, formatting, colors, and graphics of any source document, regardless of the application and platform used to create it. PDF documents can be shared, viewed, navigated, and printed exactly as intended.
Similarly, PostScript™ is a popular standard format for desktop publishing because it is supported by imagesetters, which are the very high-resolution printers used by service bureaus to produce camera-ready copy.
Transition from Paper to Character-Based Digital Documents
Why would one want paper documents to be in a fixed electronic form, and not on paper? This is desirable when one wants to electronically store information that is paper.
To go from paper to digital document, the paper document may be scanned using imaging equipment (such as a scanner or digital camera). Typically, these types of documents are image documents.
Common examples of formats of fixed documents that are likely in this transition (from paper to character-based documents) include: TIFF and JPEG.
Physical Pages, Screen Pages, and Virtual Pages
The concepts of physical pages, screen pages, and virtual pages are discussed below and illustrated in FIGS. 1-5. These concepts are related but different from each other.
Screen Page
FIG. 1 illustrates a typical computer monitor 100 and more particularly, a typical “screen page” 110 of such the monitor. The screen page is the viewable real estate of a screen of the monitor 100. Typically, the dimensions of the screen page 110 have a standard ratio of relative height (H) to relative width (W). Most screen pages have a landscape orientation, where the height is less than the width (H<W).
Physical Page
FIG. 2 illustrates a typical physical page 130. Examples of physical pages represents include actual paper documents and of a fixed documents. The dimensions of a physical page correspond to those of an actual paper document and of a fixed document.
Typically, the dimensions of the physical page have a standard ratio of relative length (L) to relative width (W). Most physical pages have a portrait orientation, where the length is greater than the width (L>W).
Although a physical page may have any orientation and size, a portrait-oriented letter-sized (8.5″×11″) page is ubiquitous in the United States. The physical pages (e.g., page 130) of FIGS. 1-5 and FIGS. 7-10 are illustrated to approximately represent a standard U.S. ubiquitous page size.
Although electronic, fixed documents are typically formatted for output on a physical page of paper. Herein, the fixed size and fixed orientation of a fixed document is also called a “physical page.”
Virtual Page
FIG. 3 illustrates a typical virtual page 140. A virtual page is the portion of the physical page 130 viewed through the screen page 110 of the monitor 100. In lo other words, the virtual page 140 is the mapping of the screen page 110 onto the physical page 130 (or vice versa).
As shown in FIG. 3, the relative dimensions of the physical page 130 typically do not match the relative dimensions of the screen page 110. Although the relative widths (W) are comparable, the relative length (L) of the physical page 130 does not match the relative height (H) of the screen page 110.
It is possible to reduce the overall size of the physical page 130 so that the entire page is viewable on the screen page 110. However, this is not desirable because the content (e.g., text) of the physical page is difficult to read on a typical computer monitor. The content effectively becomes illegible.
To maximize legibility, it is common to display only a portion of the physical page 130 on the screen page 110 at any one time. Typically, the entire width of the physical page 130 is viewed in the screen page 110, but only a portion of the length of the physical page 130 is viewed in the screen page 110. This portion is called the virtual page 140. An unviewed portion 142 of the physical page 130 is illustrated in FIG. 3 as a shaded box.
Virtual Paging Paradigm
A virtual paging paradigm is a technique used to determine the appropriate manner to display one or more physical pages of a fixed document on a screen page so that the relative dimensions of physical pages fit within the screen page and the content of the physical pages remains comfortably legible. This is also called “virtual pagination.”
In other words, a virtual paging paradigm is how a fixed document is divided into multiple virtual pages.
In addition to maintaining comfortable legibility, these techniques may also maintain aspect ratio and good margins. Generally speaking, being “comfortably legible” and having “good margins” on a computer screen are a subjective determination. However, those of ordinary skill in the art understand and appreciate how to make these subjective determinations by using objective and/or subjective observations.
Of course, if the relative dimensions of the physical pages of a fixed document fit within a screen page while the contents remain comfortably legible, then virtual pagination is trivial. The challenge arises when the physical pages of a fixed document do not fit within a screen page while the contents remain comfortably legible. By a large margin, that is the most common situation.
The virtual paging paradigm may also be called “VP paradigm.”
Conventional Virtual Paging Paradigm
The conventional VP paradigms are illustrated in FIG. 4 and FIG. 5. With both conventional paradigms, a reader typically “scrolls,” “pans,” and/or “zooms” to view different virtual pages.
These conventional VP paradigms may also zoom a view of a fixed document. Zoom increases the size (thus, the legibility) of the viewed portion of a document and pan to change the view displayed on the screen. Consequently, these conventional VP paradigms may be called “zoom-and-pan” paradigms.
FIG. 4 illustrates an example of a conventional VP paradigm. Specifically, it illustrates a “multiple virtual page within physical page boundary with overlap” VP paradigm. In short, that is the multiple VP w/in PP boundary w/overlap VP paradigm.
More specifically, FIG. 4 illustrates the physical page 130. That page is divided into two virtual pages, 142a and 142b. In this example, the virtual pages do not cross a boundary of the physical page 130. In other words, a virtual page does not display portions of more than one physical page at a time.
With this conventional VP paradigm, overlap 152 is a portion of the physical page 130 that appears in both virtual pages. Overlap 152 is the portion of the physical page 130 displayed at the bottom of virtual page 142a is again displayed in virtual page 142b, but at the top.
FIG. 5 illustrates another example of a conventional VP paradigm. Specifically, it illustrates a “virtual page across physical page boundary with overlap” VP paradigm. In short, that is the VP over PP boundary w/overlap VP paradigm.
More specifically, FIG. 5 illustrates physical pages 130 and 132. These pages are divided into three virtual pages: 144a, 144b, and 146c. The virtual pages may cross a boundary of the physical pages. In other words, a virtual page may display portions of more than one physical page at a time. For example, virtual page 144b includes portions of physical page 130 and physical page 132.
This paradigm also has overlap between virtual pages. However, the overlap is typically less pronounced. Overlap 154ab is the portion of the physical page 130 displayed at the bottom of virtual page 144a is again displayed at the top of virtual page 144b. Overlap 154bc is the portion of the physical page 132 displayed at the bottom of virtual page 144b is again displayed at the top of virtual page 144c. 
Overlap
Why do the conventional VP paradigms include overlap? Why repeat textual information from one page to the next?
With the VP over PP boundary w/overlap VP paradigm of FIG. 5, the primary reason for overlap is to ensure that each line of text (on the physical page) is displayed in its entirety. The overlap avoids splitting a line of text.
For example, if there were no overlap, the bottom of virtual page 144b of FIG. 5 would split a line of text. Since there is overlap 154bc, that line of text is displayed in both virtual page 144b and 144c. 
If a line of text was split, the top of the line would be displayed at the bottom of one virtual page and the bottom of the line would be displayed at the top on the next virtual page. Of course, a line of text split in this manner is very difficult to read. The conventional solution to this problem is to display an overlap large enough to probably prevent any splitting.
Conventional Experience of Reading Fixed Documents
However, this conventional solution introduces a new problem: The overlap hinders a person's reading experience because they must search for unread text. Although this may be a trivial task, the cumulative effect of repeating this task for each virtual page is likely to make the reading experience less enjoyable than the natural reading a paper document.
Accordingly, what is needed is a new virtual paging paradigm that enhances the reading experience that a person has when reading virtual pages of a fixed document. The reading experience with this new paradigm corresponds to the natural reading experience that a person has with a paper document.