The present invention relates generally to the field of coding “constructs” as ordered binary bit sets, and more particularly to coding alphanumeric characters as ordered binary bit sets.
Many coding schemes used to make code for computers convert pieces of data into ordered binary bit sets (sometimes herein referred to as “bit sequences”). In this document, pieces of data susceptible to encoding as bit sequences will be referred to as “code constructs.” Some examples of code constructs as follows: machine logic functions (for example, calculating an arithmetic mean), operators (for example, the union operator for dealing with sets), numbers, portions of image data (for example, pixel properties), portions of audio data, alphanumeric characters, computer operation functions (for example, SAVE), etc. When coded binary bit sequences are stored and/or transmitted they are typically subject to errors and/or data degradation. Different types of data degradation tend to have different degradation characteristics. For example, Hamming encoding is designed to help counter certain types of data degradation. As a further example, when binary data is stored on tape storage media, it tends to undergo “unidirectional bit rot,” where binary one's may degrade into zero's, but binary zero's do not tend to degrade to one's.
A couple features of conventional binary coding will now be discussed in the following paragraphs.
First, references to ordering of binary bit sequences do not necessarily refer to the order in which the bits are transmitted through a transmission medium and/or stored on a storage medium. Rather, the ordering and/or sequence refers to the ordering and/or sequence used when code constructs are encoded into and/or decoded from binary bits. When machine logic stores and/or transmits the bits, the ordering can be changed for various reasons, so long as the ordering needed to decode the code constructs can be reconstructed from the transmitted and/or stored data.
Second, this document largely focuses on types of coding (sometimes herein referred to as code construct coding) where each discrete code construct is coded into a bit sequence. This is to be distinguished from types of coding, such as certain types of lossy data compression, where binary bit sequences representing multiple code constructs (for example, multiple alphanumeric characters) are further coded to represent a larger piece of data (for example, a book length piece of text).
Third, some code construct coding uses a fixed bit length for every bit sequence that represents a code construct. An example of fixed length code construct coding is ASCII (American standard code for information interchange) coding for alphanumeric characters, where each alphanumeric character is represented by a seven bit long bit sequence. Other code construct coding schemes assign different length bit sequences for different code constructs. An example of variable length code construct coding is Unicode16 coding for alphanumeric characteristics, where some alphanumeric characters (called singletons) are respectively represented by an eight bit long bit sequence, and other alphanumeric characters are respectively represented by longer assigned bit sequences. In this document, “code construct coding” collectively refers to both fixed length code construct coding and variable length code construct coding.
Fourth, some code construct coding schemes include unassigned bit sequences (sometimes herein referred to as “unassigned code-points”). As a simple example, assume a coding scheme where: (i) the only code constructs “code-able” into the code are alphanumeric characters “A,” “B,” and “C”; (ii) alphanumeric character “A” is represented by two bit length bit sequence 00; (iii) alphanumeric character “B” is represented by two bit length bit sequence 01; and (iv) alphanumeric character “C” is represented by two bit length bit sequence 10. In such an encoding scheme, the bit sequence 11 is an unassigned bit sequence. Unicode16 and Unicode32 are code construct (specifically, alphanumeric code construct) coding sequences which include unassigned bit sequences.
Fifth, in some code construct coding schemes (herein referred to as “redundant code construct coding schemes”), each assigned bit sequence includes redundant bits. Two examples of redundant code construct coding schemes will respectively be discussed in the following two paragraphs.
Computer Networks: Fundamentals & Applications by R. S. Rajesh, states as follows: “Error correcting codes . . . . Techniques covered so far deal with error detection only. When error-detecting techniques are used, and the receiver receives the data with error, the receiver discards the data and asks for retransmission. On the other hand, error-correcting codes are used to identify the error bits in the received data and correct them. The main problem with error-correcting codes is that they require more redundancy bits than the error-detecting codes. This leads to wastage of transmission bandwidth. Single-bit error correction . . . . The key issue in error-correction is to identify the position of [an] invalid error bit, in order to correct it. For example, when 7-bit ASCII code is transmitted, the error-correcting code must identify the position of the bit that contains an error. Hence, at least three redundant bits are used to identify the possibility of error in the seven positions in an ASCII character. However, if an error occurs at the redundant bits themselves, to identify it, additional bits are required. Hence the total number of bits in the transmitted data contain m+k bits. M is the number of message bits and K is the number of redundant bits. The calculation of the total number of redundant bits for single bit error correction is straightforward. One bit is used for ensuring that the received data is error-free. Other bits are used to indicate one out of M message and K redundant bits that may contain an error. Hence, the value of K must be chosen such that 2K≥M+K+1. For example to correct [a] single bit error in 7-bit ASCII code, at least 4 redundant bits are needed. Hence, the transmitted data contains 11 bits for each data units [sic].”
Data Communications and Computer Networks: A Business User's Approach by Curt White (8th Edition) states as follows: “For a data code such as ASCII to perform forward error correction, redundant bits must be added to the original data bits. These redundant bits allow a receiver to look at the received data and, if there is an error, recover the original data using a consensus of the received bits . . . . For a simple example, . . . transmit three identical copies of a single bit (majority operation). Thus, to send a 1, 111 will be transmitted. Next consider what would happen if the three bits received have the values 101. In forward error correction, the receiver would assume that the 0 bit should be a 1 because the majority of bits are 1.”