This invention relates to a method and devices to be used in Arabic-Farsi teleprinters, typewriters, typesetting control, computer input/output terminals, and displays. In addition the devices and method may be applied to similar terminals which may combine Arabic with other languages. 2. State of the Prior Art
Arabic scripts used for languages such as Arabic, Persian and Urdu (Arabic-Farsi languages) generally contain many more characters and character forms than are found in Roman script used for English, French, etc. Accordingly, coding techniques developed for transmitting, receiving, typesetting, and the like in connection with languages based upon Roman scripts may not be directly applicable for use in encoding and decoding of languages employing Arabic scripts.
A prime example of a coding technique that is used for transmission of the English language is the 5-bit Baudot code used in teleprinting throughout the world on the International exchange system. This 5-bit code can accommodate Roman script since only 26 letters or characters are involved and all 26 letters plus 10 numbers and various punctuations, symbols and functional keys can be accommodated by the Baudot code. On the contrary, it has been thought that the 5-bit Baudot code cannot accommodate the 60 or more characters and character forms that might be required to provide for the transmission of good quality Arabic-Farsi languages by teleprinter. Accordingly, various compromises have been suggested as well as various coding techniques that require more than 5 bits and thus are not compatible with the existing International exchange requirements.
One solution offered by M. S. Chaudhry in U.S. Pat. No. 3,998,310 does not provide all the character forms, does not take into consideration the requirements for numerals, arithmetic signs, punctuation, and diacritical marks and expands coding requirements so as to be incompatible with existing teleprinter systems. Chaudry reduces the number of letters on a keyboard by dividing Arabic letters into two forms, short form and full form, ignoring the other forms described hereinafter. Characters having both full and short forms are stored in short form when followed by another character and in full form when followed by space. Chaudhry also expands the coding requirements by using a 6-bit code with a seventh bit for "checking". Although it is suggested that other codes may be used, there is no disclosure of a system that provides for transmission and reception of complete Arabic-Farsi languages over standard teleprinter systems.
Hanson U.S. Pat. No. 3,513,968 discloses a typesetting control system in which 6-bit signals representing Arabic characters and space units are stored in a first shift register and successively decoded to classify the data into one of three classes for storage in a second shift register. A second decoder determines the form of the character from the character classification immediately preceding and following the given character. The latter information, and the character form are used to address a memory to select a character in its desired form.
Hyder U.S. Pat. No. 3,938,099 discloses a printing system in which Arabic characters are coded using 8 bits and 11 bits. An analyzer is provided to analyze the concatenation properties applicable to each character using Boolean equations based on knowledge of the variables of the preceding and following characters. This information from the analyzer combined with the character representation code and the composite code is then converted into a code suitable for driving output means.
Other approaches have been undertaken to reduce the number of required characters on machines such as teleprinters by omitting some Arabic character forms and deleting the arithmetic signs and punctuation marks so that the remaining number of characters and operations can be coded in the standard 5-bit binary Baudot coding. Another approach has been to use the English (i.e. Latin or Roman) alphabet to transmit Arabic on English teleprinters.
None of the above approaches solves the problem of transmitting good quality Arabic plus the numerals, arithmetic signs, etc. over the International exchange networks which use Telex and Gentex Exchange systems and utilize standardized 5-bit binary Baudot coding. The elimination of characters greatly diminishes the quality of the Arabic language transmission and much of the expression may be lost or at least may be difficult to read.
To achieve desired quality levels by past approaches has required many more than 5 binary bits for encoding the Arabic characters. As a result, considerably more computer storage is required when Arabic script rather than Roman script languages are used in conjunction with computer systems. Furthermore, the transmission energy requirement of a given message is reduced as the number of bits per character is reduced so such reduction is very desirable.