1. Technical Field
The present invention relates in general to text editors which support double-byte character sets (DBCS). More specifically, the present invention relates to text editors for mixed DBCS and single-byte character set (SBCS) text in systems with SO-SI control characters or with emulated SO-SI control characters or without SO-SI control characters, and to a method, for use therein, of accomplishing text replacement while maintaining the columnar integrity of the text.
2. Description of the Related Art
Characters in computer systems are typically stored as a number code which is interpreted by the computer system in accordance with a code table. Common examples of such codes are the ASCII code, which is used in most Roman alphabet (A to Z) based personal computers, and the EBCDIC code used in many mainframe computers. Typically, each character defined in the ASCII or EBCDIC code requires a single byte of storage space and thus up to 256(2.sup.8) unique characters (including alphanumerics, symbols and control or other reserved characters) may be defined.
When other, non-Roman alphabet character sets are desired, for example Japanese Kanji or Chinese pictograph characters, more than 256 unique characters may be required. To enable the necessary number of characters to be defined, double-byte character sets (DBCS) have been employed wherein each character is defined by two bytes of storage space. This allows over 65,000(2.sup.16) unique characters to be defined.
In countries that use ideographic characters like China and Japan, it is important that DBCS and SBCS text can be intermixed in a single document. For example, a database program may be originally written in English and a non-English version may be sold wherein Roman alphabet prompts and other user interface information is replaced with non-Roman alphabet prompts (i.e. --Japanese Kanji characters), allowing mixed DBCS and SBCS text to be stored in the database program.
Previously, when producing such a program or text file with mixed SBCS and DBCS characters on an EBCDIC terminal, a user would switch the terminal from SBCS mode to DBCS mode by employing a Shift Out (SO)-Shift In (SI) sequence which places a DBCS identifier before (SO) and after (SI) the DBCS characters. When the computer encounters a SO identifier (control character), it knows that the next byte is the first byte of a two-byte DBCS character and not a SBCS character. The terminal continues to interpret the data as two-byte DBCS characters until a SI control character is encountered.
For example, to place the Chinese pictograph in a text string, the user would Shift Out (SO) of SBCS mode by pressing a Shift Out key (or predefined Shift Out key sequence such as ALT-ESC) on the terminal, compose the desired pictograph character by pressing a predefined key sequence or by entering a predefined number code such as 8248 (comprising two hexadecimal digits, 82 and 48, representing the two bytes identifying the DBCS character) and then re-enter SBCS mode by pressing a Shift In key or key sequence. Depending upon the particular display and computer system in use, the symbol may be displayed on screen or representative code numbers may be displayed in its place. Further, a representative symbol may be displayed indicating the location of the SO and SI control characters.
More recently, personal computer operating systems such as the IBM.sup.1 OS/2.sup.2 operating system have provided code page support which allows DBCS characters to be stored without requiring a SO-SI sequence. In such systems, the first byte of the DBCS character is a byte which is not used within the SBCS character set and thus identifies the following byte as a member of a DBCS character set. Several different first bytes (each not used in the SBCS character set) may be used to refer to several different `pages` of 256 unique DBCS characters. The actual first byte values selected vary between languages and system. For example, in one IBM system the values 129 through 252 (which are not used for Roman alphabet characters) are the first bytes of DBCS character sets for Chinese pictographs. FNT .sup.1 Registered trade mark FNT .sup.2 Registered trade mark
Depending upon the particular system in use, DBCS characters may be displayed as two, two-digit hex numbers (e.g. 8248) or as the actual pictogram (i.e. ). Additionally, depending upon whether SO-SI characters are required, not required, or emulated, the DBCS may be preceded and followed by SO-SI symbols as appropriate (i.e. ).
Despite the above-mentioned capabilities of computer systems, a problem exists with text containing a mixture of SBCS and DBCS characters in that, for many applications, the columnar positioning of the text is critical. For example, when programming in programming languages like RPG, RPGII or Fortran, the position of command and data text on a line (i.e.--its columnar positioning) is crucial to the correct interpretation of that text within the program. When the text is a mixture of SBCS characters and DBCS characters it is difficult and onerous to replace (edit) the text while maintaining its columnar integrity.
Some examples of these difficulties are: replacing " ", which occupies four bytes (one for the Shift Out control character, two for the DBCS character code and one for the Shift In control character) with a SBCS character which requires a single byte; replacing "A", which requires one byte with " " which requires two bytes of storage space (or as many as four bytes if SO and SI control characters are required). In each example, the columnar integrity of the text on the line, after the site of the replacement, will be affected.
Text editors capable of editing mixed SBCS and DBCS text do not presently maintain the columnar integrity (i.e.--columnar positioning) of text on a line after an editing operation has been performed. Thus, the user must manually adjust the text to maintain columnar integrity by manually inserting spaces or deleting characters and a failure to do so, or doing so incorrectly, results in those subsequent characters being moved out of their desired columnar positions leading to the program misinterpreting the command and/or data text on that line. Further, in systems wherein a fixed line length is set, an edit operation may be aborted by the system as it would result in the maximum line length being exceeded, even in situations where the maximum line length is exceeded as an intermediate editing step and the final line length would be acceptable. It can be most confusing and frustrating to a user to attempt an edit operation which is refused by the system, for no apparent reason, especially when the same edit was performed on a preceding (albeit shorter) line without difficulty.
Also, when a replace edit operation is performed, it is often difficult for the user to determine the extent to which the edit operation will affect subsequent text. For example, replacing the "A", in "CABLE" with a DBCS character such as " ", will result in the "B" after the "A", also being overwritten to give "C LE". When SO-SI characters are present in the text to be edited, replacement operations become still more confusing and difficult to implement properly.
It is desirable therefore, to provide a method of editing mixed SBCS and DBCS text which maintains columnar integrity of the edited text. It is also desired to provide an indication to the user of the extent to which a character replacement will affect subsequent text on a line and to avoid the problems of exceeding maximum line lengths.