1. Field of the Invention
The present invention relates to a system and method for processing page description language (PDL) files, and more specifically, for removing redundant information from a PDL file.
2. Description of Related Art
Printing refers to the reproduction of words and pictures on a page or document. Today, the high volume production machines of the major printing processes are the presses, which use plates (or other types of image carriers) to transfer the ink onto the paper or substrate. These processes are often used to support markets such as commercial printing, magazines, newspapers, catalogs, books, business forms, greeting cards, maps, labels, packaging, and other printed products.
One common type of production printing process is the offset printing process which uses an intermediate blanket cylinder to transfer an image from the image carrier to the substrate. The offset printing process, and in particular its prepress operations, involve intricate manual operations which are very time consuming and cost intensive, and require highly skilled expensive professionals.
With the development of digital image processing, digital printing systems may be used to improve the productivity, quality, and efficiency of many printing operations. Many digital printing systems use a plateless printing process. Common plateless digital printing processes include electrophotography, ink-jet, and thermal transfer, etc. Digital printing systems are often desirable over printing press processes because (1) most of the equipment are suitable for an office environment (2) its capabilities of variable printing from impression-to-impression; and (3) requires less manual skills than printing on conventional plate presses.
As the printing industry transitions from conventional printing press operations to digital printing operations to take advantage of the technological advances made in digital imaging, it is possible to provide a more automated printing process. One approach to providing a more automated digital printing process is to store, back-up, recover, and print a multiple-page document as a single file, particularly when the multiple-page document has several hundreds or thousands of pages.
The multiple-page document file is typically written in a page description language (PDL) or some other programming language that can be recognized by an output device or processing device. PDL generally refers to a computer language designed for describing how type and graphic elements should be produced by an output device (e.g., a printer). PostScript®, developed by Adobe Systems, Inc., is a widely adopted PDL that can be used to specify the contents of a page that is to be printed. PostScript is a registered trademark of Adobe Systems, Inc. Each PostScript file is a purely text-based description of a page which uses the ASCII character set and which can be generated on every widely used operating system. The biggest advantage of PostScript is device independence. Graphics are defined not according to the characteristics of a particular device (i.e., page size, color depth, resolution, etc.) but independently. In other words, it is possible to output a PostScript file with more or less identical results on various machines—the only visible difference is the increasing reproduction quality as the resolution increases. Virtually every application program running in every desktop computer outputs PostScript and virtually every printer of every type accept PostScript-coded files.
It has been observed that PDL or PostScript files, particularly large files, often contain redundant information. For example, PostScript pages often have 20% of its source represent page content and 80% of its source represent prolog material located at the beginning of the file to prepare the page environment (e.g., macros). Often, the same macros and other information in the prolog are repeated in the prolog of one or more PostScript pages within a multiple-page document. By repeating the same macros and material in multiple page prologs, the multiple-page document file consumes considerably more storage space than if the redundant information was consolidated in a more efficient manner. Furthermore, repeating the same information in multiple page prologs may not print a PDL or PostScript file in a highly efficient manner.