1. Field of the Invention
This invention relates generally to the control of a printer and to the compression and encoding of information sent to a printer. In this context, a printer is not only envisaged as a conventional computer printer but also as a remote fax printer or other types of display or presentation devices. This invention also relates to binary encoding of a computer language, and more particularly, to a system and method for efficient binary encoding of nested procedures in which the binary representation contains a field pertaining to the length of the procedure.
2. Discussion of the Background
A large amount of information is sent every day to printers throughout the world. It is often desired to have information printed faster and ready for use sooner in today's hectic office environments. Data compression systems are known in the prior art that encode a stream of digital data signals into compressed digital code signals and decode the compressed digital code signals back into the original data. The object of data compression systems is to effect a savings in the amount of storage required to hold or the amount of time required to transmit a given body of digital information. By decreasing the required memory for data storage or the required time for data transmission, compression results in a monetary savings. If tapes or disks are utilized to store data files, then fewer tapes or disks are required for storing compressed files. If telephone lines or satellite links are utilized for transmitting digital information, lower costs result when the data is compressed before transmission as a smaller amount of transmission time is required.
For example, it may be desired to transmit the contents of a daily newspaper or weekly newsletter via satellite link or telephone line to a remote location for printing. Appropriate devices may convert the contents of the newspaper and newsletter into a data stream of characters for transmission via the communication link. If the symbols comprising the contents of the newspaper or newsletter were compressed before transmission and reconstituted at the receiver, a significant amount of transmission time is saved.
As a further example, when an extensive collection of documents such as word processing data files on a network file server of a large corporation are stored for archival purposes, a significant amount of storage space would be saved if the totality of symbol signals comprising the documents were compressed prior to storage and reexpanded from the stored compressed files for later use.
A fundamental requirement for compression of digital data is that the compression system must be reversible. That is, it must be possible to reexpand or decode the compressed data back into its original form without any alteration or loss of information. The decoded and the original data must be identical and indistinguishable with respect to each other.
One of the best known and most widely used general purpose data compression procedures is the Lempel, Ziv et al system, hereinafter the "LZ" system and disclosed in U.S. Pat. No. 4,464,650. The LZ system is a lossless, dynamic, on-line system for compression of textual data. The LZ system involves storing frequently-appearing string of characters in memory. A pointer to the stored string of characters is transmitted in place of the full string when a new string of characters appears in the input string that matches the stored stream of characters. When a string of characters appears in the input stream that matches a stored string of characters, but also includes a string of one or more characters, the pointer for the matched string is transmitted along with the first character of the new string of characters, with the first character being transmitted in uncompressed form.
However, a problem with the LZ data compression system is that the technique typically requires large amounts of computer memory and processing time. Additionally, once the memory which stores the strings of characters which are often repeated is full, the Lempel and Ziv method has difficulty in dynamically changing the repeated strings of characters.
Storer discloses in U.S. Pat. No. 4,876,541 a data compression system which overcomes the problems in the Lempel et al patent and enables updating of frequently appearing strings of characters which are compressed. In Storer's data compression system, both the encoder and decoder have dictionaries for storing frequently appearing strings of characters. Each string is identified by a unique pointer. The input data string is parsed and matched with strings in the encoder dictionary using a matching algorithm. The pointer associated with the matched string is then transmitted to a remote location for storage or decoding. Thereafter, using the update algorithm, the encoder dictionary is updated to include new strings of data based on the matched string of data.
An alternative form of data compression is set forth in U.S. Pat. No. 5,027,376 by Friedman et al and assigned to Microcom Systems, Inc. Friedman's compression system is used in MNP (Microm Networking Protocol) modems. In Friedman's systems, the compressing modem receives an input data stream and each character in the data stream is recodified with a compressed character code, the length of which is dependent on the frequency of the characters in the data stream. A frequency table is maintained so that changes in the relative frequency of characters in the data stream will be recognized by the compressing modem and the compressed characters representing such characters will be exchanged accordingly. A decompressing modem, connected over communication lines to the compressing modem, processes the compressed character code in a reverse order from the manner in which the compressing modem processes the codes. The decompressing modem also has a relative frequency table and as the relative frequencies of the various characters change, the actual characters represented by the compression codes must also be changed. While the above described data compression systems may be very efficient in compressing data for transmission or storage, they suffer from several drawbacks. First, the compressed files are stored in a format which is not directly readable. That is, the files must be decompressed before being used for any purpose. This decompression of the information requires processor time so that the compressed information can be returned to its original form. Second, as the data is being compressed, a large amount of memory is often required to store frequently used character strings stored as codes in the compressed file.
Data compression routines are particularly useful for word processing files. One way in which a data processing file can be stored is in a page description language which contains information of the text of the file and formatting commands. Page description languages (PDLs) are currently used to control the operation of computer printers and are ideally suited for printing documents containing both text and graphics. Examples of PDLs are Postscript.RTM. from Adobe Systems, Inc., and Interpress.RTM. from Xerox. The present invention is discussed with regard to the Standard Page Description Language (SPDL) which is currently a proposal in draft form before a section of the International Standards Organization ("ISO") as ISO/IEC DIS 10180 and is available through the American National Standards Institute ("ANSI") in New York.
A document in SPDL is a final form of the document. That is, the document has been created, edited, and all composition, formatting and positioning decisions pertaining to the document have been made. An SPDL document is ready for presentation by a presentation device such as a printer or computer monitor which displays the final form of the document. An SPDL document can also be transmitted to another computer or to a facsimile device over a communications link.
There are two primary parts to an SPDL document; structure and content. The structure of a document is independent of its content. However, the structure of a document establishes the context of interpretation for the content. For example, if bold face is available, the context of interpretation would indicate that a bold face typeface is in the available resources but the text itself would be the content. Examples of structure elements are documents, pictures, dictionary generators and tokensequences, as set forth in co-pending U.S. patent application Ser. No. 07/876,601 now U.S. Pat. No. 5,319,748. A tokensequence is a special type of structure element which contains document content. SPDL uses Abstract Syntax Notation 1 (ASN.1), as defined in ISO 8824:1990 and ISO 8825:1990, both of which are incorporated herein by reference, for binary structure encoding which defines tokensequences as Octet Strings which are a sequence of 8 bit bytes. Standard Generalized Markup Language ("SGML"), as defined in ISO 8879:1986 is used for the clear text structure encoding.
An SPDL document can be represented in two types of formats; a clear text format and a binary format. The clear text format represents a document using human readable form which can be understood by one familiar with the SPDL page description language. The binary SPDL format, on the other hand, represents a document using a binary machine readable form. A drawback of binary encoding is that it cannot be read by a human. Therefore, if there is an error in the binary encoding of an SPDL file, the error cannot be easily corrected. For example, if a document is transmitted to another system and subsequently cannot be printed, a user of the system which receive the document would not be able to correct the binary form of the document. While clear text SPDL files may be readable by both a human and a computer, clear text SPDL files require a large amount of physical storage space, are slow to be transmitted over communication lines, and require a large amount of processing time to print. However, a clear text SPDL file and a binary SPDL file are fully equivalent. That is to say any functionality that can be expressed in one can be expressed in the other with a syntactic transformation. SPDL files use different encoding schemes than Interpress and PostScript. Interpress uses only binary encoded files and these files are not equivalent to binary SPDL files. PostScript originally used only a clear text representation but now supports both clear text and binary encodings. However the clear text and binary encodings of PostScript are different than the clear text and binary encodings of SPDL.
Clear text SPDL documents can contain procedures represented by information between a "{"and"}". When a procedure is represented in the binary format, the beginning of the procedure is represented by 67H, for example, followed by a two byte representation of the number of bytes required to represent the procedure, e.g., 00H 08H to represent the procedure as being 8 bytes long, followed by a binary representation of the procedure. As a clear text procedure is being converted to a binary representation, each clear text operand or instruction is written into a storage buffer as that operand or instruction is being converted to binary. When the entire text routine has been converted to binary and written into the storage buffer, 67H is written into the binary SPDL file followed by a two byte field for the number of bytes required to represent the routine, followed by the contents of the storage buffer which contains the binary representation of the clear text routine. Therefore, before a clear text procedure can be written into a binary representation, it is necessary to know the number of bytes in the binary representation of the procedure. A major problem results in that a buffer to store the binary representation of a procedure is needed for each procedure and many nested procedures require the use of many buffers which can result in memory management problems.
An explanation of how the clear text nested procedures "{1 add {2 add {3 add}}}" is encoded into a binary representation according to a simple method follows. The above procedure is used so as to provide a simple and clear illustration of the operation and problems of a conventional method. When "{1 add" is encountered, a buffer is created which holds the binary representation of "1 add". When the second "{" is encountered, there is a nested procedure and a second buffer is created which holds the binary representation of "2 add". When the third "{" is encountered, a third buffer is created which holds the binary representation of "3 add".
When the first "}" is encountered, the third procedure containing "3 add" has ended. Then, 67H, followed by two bytes representing the length of the third buffer, followed by the contents of the third buffer are appended to the second buffer.
When the second "}" is encountered, the second nested routine which was converted to a binary format has ended. Then, 67H, followed by a two byte number of the length of the second routine (e.g., the length of the binary representation of "2 add" plus the length of the third routine) plus the contents of the second buffer are appended to the first buffer. A similar procedure occurs for the third "}" and the encoded nested procedures are written into a file or other storage location.
This conventional method is an inefficient use of memory in that multiple unfilled buffers are needed. Further, these multiple buffers must be managed especially when there are a plurality of nested routines.