1. Field of the Invention
This invention relates generally to the control of a printer and to the compression and encoding of information sent to a printer. In this context, a printer is not only envisaged as a conventional computer printer but also as a remote fax printer or other types of display or presentation devices. This invention particularly relates to a system and method for compressing and encoding numeric data represented in a clear text Standard Page Description Language (SPDL) to the N/2.sup.r binary format.
2. Discussion of the Background
A large amount of information is sent every day to printers throughout the world. It is often desired to have information printed faster and ready for use sooner in today's hectic office environments. Data compression systems are known in the prior art that encode a stream of digital data signals into compressed digital code signals and decode the compressed digital code signals back into the original data. The object of data compression systems is to effect a savings in the amount of storage required to hold or the amount of time required to transmit a given body of digital information. By decreasing the required memory for data storage or the required time for data transmission, compression results in a monetary savings. If tapes or disks are utilized to store data files, then fewer tapes or disks are required for storing compressed files. If telephone lines or satellite links are utilized for transmitting digital information, lower costs result when the data is compressed before transmission as a smaller amount of transmission time is required.
For example, it may be desired to transmit the contents of a daily newspaper or weekly newsletter via satellite link or telephone line to a remote location for printing. Appropriate devices may convert the contents of the newspaper and newsletter into a data stream of characters for transmission via the communication link. If the symbols comprising the contents of the newspaper or newsletter were compressed before transmission and reconstituted at the receiver, a significant amount of transmission time is saved.
As a further example, when an extensive collection of documents such as wordprocessing data files on a network file server of a large corporation are stored for archival purposes, a significant amount of storage space would be saved if the totality of symbol signals comprising the documents were compressed prior to storage and reexpanded from the stored compressed files for later use.
A fundamental requirement for compression of digital data is that the compression system must be reversible. That is, it must be possible to reexpand or decode the compressed data back into its original form without any alteration or loss of information. The decoded and the original data must be identical and indistinguishable with respect to each other.
One of the best known and most widely used general purpose data compression procedures is the Lempel, Ziv et al system, hereinafter the "LZ" system and disclosed in U.S. Pat. No. 4,464,650. The LZ system is a lossless, dynamic, on-line system for compression of textual data. The LZ system involves storing frequently-appearing string of characters in memory. A pointer to the stored string of characters is transmitted in place of the full string when a new string of characters appears in the input string that matches the stored stream of characters. When a string of characters appears in the input stream that matches a stored string of characters, but also includes a string of one or more characters, the pointer for the matched string is transmitted along with the first character of the new string of characters, with the first character being transmitted in uncompressed form.
However, a problem with the LZ data compression system is that the technique typically requires large amounts of computer memory and processing time. Additionally, once the memory which stores the strings of characters which are often repeated is full, the Lempel and Ziv method has difficulty in dynamically changing the repeated strings of characters.
Storer discloses in U.S. Pat. No. 4,876,541 a data compression system which overcomes the problems in the Lempel et al patent and enables updating of frequently appearing strings of characters which are compressed. In Storer's data compression system, both the encoder and decoder have dictionaries for storing frequently appearing strings of characters. Each string is identified by a unique pointer. The input data string is parsed and matched with strings in the encoder dictionary using a matching algorithm. The pointer associated with the matched string is then transmitted to a remote location for storage or decoding. Thereafter, using the update algorithm, the encoder dictionary is updated to include new strings of data based on the matched string of data.
An alternative form of data compression is set forth in U.S. Pat. No. 5,027,376 by Friedman et al and assigned to Microcom Systems, Inc. Friedman's compression system is used in MNP (Microm Networking Protocol) modems. In Friedman's systems, the compressing modem receives an input data stream and each character in the data stream is recodified with a compressed character code, the length of which is dependent on the frequency of the characters in the data stream. A frequency table is maintained so that changes in the relative frequency of characters in the data stream will be recognized by the compressing modem and the compressed characters representing such characters will be exchanged accordingly. A decompressing modem, connected over communication lines to the compressing modem, processes the compressed character code in a reverse order from the manner in which the compressing modem processes the codes. The decompressing modem also has a relative frequency table and as the relative frequencies of the various characters change, the actual characters represented by the compression codes must also be changed. While the above described data compression systems may be very efficient in compressing data for transmission or storage, they suffer from several drawbacks. First, the compressed files are stored in a format which is not directly readable. That is, the files must be decompressed before being used for any purpose. This decompression of the information requires processor time so that the compressed information can be returned to its original form. Second, as the data is being compressed, a large amount of memory is often required to store frequently used character strings stored as codes in the compressed file.
Data compression routines are particularly useful for wordprocessing files. One way in which a data processing file can be stored is in a page description language which contains information of the text of the file and formatting commands. Page description languages (PDLs) are currently used to control the operation of computer printers and are ideally suited for printing documents containing both text and graphics. Examples of PDLs are Postscript.RTM. from Adobe Systems, Inc., and Interpress.RTM. from Xerox. The present invention is discussed with regard to the Standard Page Description Language (SPDL) which is currently a proposal in draft form before a section of the International Standards Organization ("ISO") as ISO/IEC DIS 10180 and is available through the American National Standards Institute ("ANSI") in New York.
A document in SPDL is a final form of the document. That is, the document has been created, edited, and all composition, formatting and positioning decisions pertaining to the document have been made. An SPDL document is ready for presentation by a presentation device such as a printer or computer monitor which displays the final form of the document. An SPDL document can also be transmitted to another computer.
There are two primary parts to an SPDL document; structure and content. The structure of a document is independent of its content. However, the structure of a document establishes the context of interpretation for the content. For example, if text is bold face, the context of interpretation would include the bold face font and the content would indicate that a bold face typeface should be used for the text and the text itself would also be in the content. Examples of structure elements are documents, pictures, dictionary generators and tokensequences, as set forth in co-pending U.S. patent application Ser. No. 07/876,601. A tokensequence is a special type of structure element which contains document content. SPDL uses Abstract Syntax Notation 1 (ASN.1) for binary structure encoding, as defined in ISO 8824:1990 and ISO 8825:1990, both of which are incorporated herein by reference, which defines tokensequences as Octet Strings which are a sequence of 8 bit bytes.
An SPDL document can be represented in two types of formats; a clear text format and a binary format. The clear text format represents a document using high level language programming commands similar to English, which can be understood by one familiar with the SPDL page description language. The binary SPDL format, on the other hand, represents a document using binary instructions and operands and is a machine language type of representation. While clear text SPDL files may be readable by both a human and a computer, clear text SPDL files require a large amount of physical storage space, are slow to be transmitted over communication lines, and require a large amount of processing time to print. However, a clear text SPDL file and a binary SPDL file are fully equivalent. That is to say any functionality that can be expressed in one can be expressed in the other with a syntactic transformation.
A conventional method of representing a binary floating point number is using the IEEE 754 Standard for Binary Floating-Point Arithmetic. There are two formats under the IEEE Standard; the single precision format which uses 4 bytes to represent a number, and the double precision format which uses 8 bytes to represent a number. For both formats, three fields are used to represent a number; a one bit sign field s, a biased exponent e, and a mantissa or fraction part f.
For the single precision representation of a number, the first bit in the binary representation is the sign bit s. The next 8 bits are for the exponent e, and the last 23 bits are for the fraction f, as illustrated in FIG. 1A. For the single precision format, a non-zero number X is represented in the IEEE standard by the following rule: EQU If 0&lt;e&lt;255, then X=(-1).sup.s *2.sup.e-127 *1.multidot.f
Eight bytes are used to represent a number for the double precision IEEE 754 standard. The first bit in the binary representation is the sign bit s. The next 11 bits are for the biased exponent e, and the last 52 bits are for the mantissa or fraction part f, as illustrated in FIG. 1B. For the double precision format, a non-zero number X is represented in the IEEE standard using the following rule: EQU If 0&lt;e&lt;2047 then X=(-1).sup.2 *2.sup.e-1023 *1.multidot.f
For both the single and double precision IEEE 754 standard, when the sign bit is 0, the number is positive and when the sign bit is 1, the number is negative. The biased exponent e is used for shifting the decimal point of the mantissa or fraction part f. A biased exponent is used so that the exponent can represent both positive and negative exponents. For instance, when 8 bits are used to represent the exponent e, there are 256 (2.sup.8) different codes to represent both positive and negative exponents. An approximately equal range would be -126.ltoreq.exponent.ltoreq.+127. The IEEE 754 standard has the smallest exponent (eight 0s) represent the smallest exponent (-126) and the largest exponent (eight is) represent the largest exponent (+127). Therefore, the exponent bias which must be added to the actual exponent for single precision numbers is +127 and for double precision numbers +1023. Consequently, if the exponent of a number has a value of 0, its representation should be 127 for the single precision representation and 1023 for the double precision representation.
While the IEEE 754 floating point standard can represent most floating point numbers, there are two problems with its representation. First, even using the double precision IEEE standard, precision of numbers can be lost when the number to be represented has more then 15 digits because the mantissa or fraction part is limited to 15 bits. For example, both 8388607.9660937499 and 8388607.996093501 are represented in the IEEE standard using double precision as 8388607.99609375. Second, the IEEE standard of representing a number requires 4 or 8 bytes, for single and double precision representations, respectively, and 4 or 8 bytes may be an unnecessary waste of memory in representing a number.