A page description language (PDL) is a method of describing printed pages in a printer independent format. A PDL establishes a common interface language between a print driver or client and a print server or printer. A single standard PDL does not presently exist. Accordingly, a number of industry standards have emerged. Currently existing PDL standards include PDF from Adobe, PostScript (“PS”), Hewlett Packard Printer Control Language (“HP-PCL”) and Interpress Page Description Language, among others.
Commercially-available PDLs define the construction of various typefaces for characters and numerals. There are other conventions for organizing image data independent of any typefaces therein. These “image formats” include TIFF, CALS, as well as image formats associated with facsimile transmission, such as CCITT fax Group 3 and fax Group 4. Image formats are a system of shorthand commands which enable raw image data, e.g., a set of binary numbers corresponding to black and white pixels, to be compressed into a more manageable form. To take one basic example, an image format such as TIFF or CALS may include an instruction within a data set corresponding to “print a white line” in lieu of a long string of numbers, such as zeros, each number corresponding to one individual pixel in the white line. In this way, image data may be retained in smaller memory spaces than would be required if every single pixel in an image had its own bit of memory. As used herein, the term “image data” shall apply to image data in either image format or PDL, and an “image data set” shall mean a meaningful quantity of such data, such as data for an image or a connected series of images.
With any PDL or image format, there will inevitably be a step of translating the PDL or image format data into a form usable by an output device, such as a printer. Printing hardware requires an input stream of binary data. Thus, the instructions within the image format, such as to “print a white line,” will eventually have to be translated into the actual binary code. This code can then be applied to the modulation of a laser source in a raster output scanner, or applied sequentially to individual ejectors in an inkjet printer.
A PDL Guesser is a set of programmatic instructions that determines the page description language (PDL) or Image Format in which a print job is written by analyzing a sample of the data stream received from a data source. The PDL Guesser is essential to an electronic print system since it determines if the system can print a specific job that it receives and, if so, the appropriate format for such job. Once a PDL Guesser in an electronic print system determines the PDL or Image Format of a print job, the print system can print the print job in accordance with the supported PDL or Image Format.
Approaches to determining the PDL or Image Format include looking for specific character strings at the beginning of a very small portion of the image data, as taught in U.S. Pat. No. 5,526,469, commonly assigned as the present application and the disclosure of which is incorporated herein by reference in its entirety. Another PDL guesser searches for a specific command instruction at the beginning of each print job or recognizes unique command instructions, as described in U.S. Pat. No. 5,493,635, commonly assigned as the present application and the disclosure of which is incorporated herein by reference in its entirety. Other guessers recognize a PDL or Image Format by its “signature” string or by the frequency of the occurrence of certain “operators”, as explained in U.S. Pat. No. 5,402,527, commonly assigned as the present application and the disclosure of which is incorporated herein by reference in its entirety. Yet another PDL guesser utilizes statistical analysis to recognize a PDL or Image Format, as detailed in U.S. Pat. No. 5,293,466, the disclosure of which is incorporated herein by reference in its entirety. U.S. Pat. No. 6,525,831, disclosure of which is incorporated herein by reference in its entirety, describes a PDL guesser that samples a print job stream, and verifies that in the data sample all command operator strings, their associated parameters and interspersed data are valid for a particular PDL or Image Format.
The aforementioned PDL guessing methods and systems, however, are not sufficient or effective to identify and process for all PDLs and Image Formats. For example, a PDL or Image Format may not have an identified “signature” string present in every print job. Further, there may not be a representative sample of command operators that can be guaranteed to exist in every print job. Similarly, the frequency of occurrence of certain PDL or Image Format command operator strings may not be guaranteed across all print jobs. Still further, the PDL or Image Format may not lend itself to statistical analysis since no characteristic sequences may occur in a significant portion of the print job samples.
Different types of print drivers use different page description languages (PDLs), which are directed to specific printer devices. Devices can therefore easily identify the PDL created by their drivers; however, if the print stream is corrupted or was created using a wrong or unknown print driver, the device may not be able to detect the PDL and it will pass it through as plain text, potentially resulting in reams of illegible text being printed and thus substantial amounts of wasted paper.
There is therefore a need for a method and system that can detect when a PDL has been misidentified by the PDL guesser, and can then accurately identify the intended PDL type of a document to correct the error. Such a method can be used for terminating a job with misidentified PDL before it prints in its entirety, thereby saving resources and energy that would otherwise be wasted.