1. Field of the Invention
This invention relates to data compression for black-white documents which contain both symbol and graphic portions.
2. Description of the Prior Art
Digital facsimile systems commonly use compression techniques in order to minimize the number of bits required to fully describe documents. The International Telegraph and Telephone Consultative Committee (CCITT) has selected what is known as the Modified READ code as the optional two-dimensional algorithm for Group 3 facsimile equipment. This Modified READ code was subsequently adopted by the Electronic Industries Association on Apr. 3, 1981 as the EIA Standard RS-465. The present development of standards for Group 4 facsimile equipment is currently being developed which will utilize a version of the Modified READ code as the standard data compression algorithm in combination with some form of "mixed-mode" algorithm as an option.
Almost all documents which are transmitted are a mix of both symbols and graphics. Modified READ codes will handle both, however, the number of bits required to describe a document containing only symbols is much greater than that required by non-facsimile equipment designed for symbol only transmission. That is, the equipment which is designed for symbol only transmission, such as communicating word processors or some form of Teletex utilize symbol only methods which cannot transmit graphics such as signatures and other non-standard items.
Mixed-mode algorithms have the capability of transmitting both symbols and graphics efficiently. Prior art techniques utilizing mixed-mode algorithm include the combined Symbol Matching (CSM) algorithm as well as what is known as the Extended Teletex algorithm.
The CSM algorithm which will be referred to hereinafter as the "Symbol Removal/Scan Line" algorithm deals with each symbol on the text on a more or less individual basis with regard to its location. In this particular approach the document is scanned in the normal fashion line-by-line from top-to-bottom and from left-to-right until a group of black pels (picture elements) is encountered which matches a symbol in a stored library. All black pels within the rectangular symbol space are then changed to white and the symbol code and position are recorded. After the symbols have been "removed", the document is re-scanned and the remaining portions are encoded using Modified READ code. The detected symbol codes are inserted before the READ code of the scan line in which the top of the symbol occurs. The presence of a symbol code rather than a READ code, is indicated by a single bit at the beginning of every scan line. If the bit indicates that there are symbols within the particular scan line, an 8-bit symbol code follows. Subsequently and, in turn, this 8-bit symbol code is followed by an 11-bit horizontal position code word, (2.sup.11 =2,048 which is greater than the 1,728 pels in the scan line). This 8-bit symbol is followed by the 11-bit symbol which may be, in turn, followed by an additional symbol/horizontal-position code pairs for any other symbols that may have been detected on the scan line in the order of horizontal position. Lastly, the symbol data is terminated by a special 8-bit symbol code which indicates there are no more symbols on the scan line. Following this special symbol code the modified READ code for that particular line is transmitted.
In this particular Symbol Removal/Scan Line technique, the recognized symbols will be encoded as they are first encountered by the scanning process regardless of the location of their appearance relative to other symbols or graphics. The vertical position of the symbols is implied from the scan line on which the particular symbol code appears.
In the second mixed-mode approach which has recently been proposed and which has been referred to as the "Extended Teletex", the entire document is divided into character spaces except for the areas which are defined as being graphics. All character symbols, including blanks, are transmitted using 8-bit symbol codes. The graphics are transmitted by Modified READ code as they occur within a particular line of symbols. The first step in the "Extended Teletex" method involves a special 8-bit symbol code which is used to designate the transition from symbol codes to graphics. This is followed by an 11-bit code giving the width of the graphics area with the height of the graphics area being defined by the height of the symbol font. Subsequently the modified READ code for the graphic is sent in such a manner that the length of the modified READ code is defined by the width and height of the graphics area so that the transition back to symbol code does not require a separate code.
In the Extended Teletex method, instead of transmitting a series of "blank" symbol codes at the right of the symbol line, a special 8-bit code can be designated which performs the carriage-return and line-feed functions. Obviously this special 8-bit code for carriage-return and line-feed must be to the right of any graphics which appear on the particular line. The code designating the last symbol on the line also directs the receiver to start on the next line of symbols.
One of the drawbacks with regard to the prior art systems described above is that the symbols must be organized either into lines or else they must be such that each location is conveyed independently of all other location. Either one of these two requirements reduces the amount of compression which can be accomplished.