This invention relates to computer-readable character sets for encoding human-readable data characters in a data carrier such as an RFID tag, optical tag or touch memory device.
Data carriers store computer-readable data or information as character codes that represent human-readable xe2x80x9cdata characters.xe2x80x9d Data characters include not only human readable characters, but also include special function characters such as start, stop or shift characters that provide certain functional data. Each data character is represented by a unique character code. The character code is typically a number, the numbers being assigned from 0 through a largest number. The largest number is determined by the number of bits in a computer""s data path or bus. The computer typically employs a binary representation of the character code, although humans find a decimal or hexadecimal representation more convenient. A set or mapping of data characters and their corresponding character codes is commonly referred to as a xe2x80x9ccharacter set.xe2x80x9d The computer industry uses its own character encoding standards, for example, the American Standard Code for Information Interchange (ASCII) character set. Full ASCII defines a character set containing 128 data characters. Each data character in full ASCII is represented by a unique 7-bit character code (27=128). 
The computer industry has grown beyond the limits of the full ASCII character set. As the computer markets have grown, the need has also arisen to support additional languages not defined by the full ASCII character set. New character sets were developed to accommodate clusters of characters in related languages. The original 7-bit full ASCII character set was expanded to 8 bits thus providing an additional 128 data characters. This additional 128 set of data characters (the xe2x80x9cupper 128xe2x80x9d or xe2x80x9cextended ASCIIxe2x80x9d) allowed for additional characters present in the related romance languages (i.e., French, German, Spanish, etc.) to be represented.
As the computer markets grew internationally, however, even more languages were required to be included in the character set. Particularly, the Asian markets demand a character set, usable on computers, which support thousands of unique characters. To uniquely define each of these characters, a 16-bit encoding standard was required.
Several 16-bit (two byte) encoding standards such as Unicode, Big Five, GB, KOR, JISC-6226-1983, and others have recently been developed. The Unicode character encoding standard is a fixed-length, uniform text and character encoding standard. The Unicode standard can contain up to 65,536 characters, and currently contains over 28,000 characters mapping onto the world""s scripts, including Greek, Hebrew, Latin, Japanese, Chinese, and Korean. The Unicode standard is modeled on the ASCII character set. Unicode character codes are consistently 16 bits long, regardless of language, so no escape sequence or control code is required to specify any character in any language. Unicode character encoding treats symbols, alphabetic characters, and ideographic characters identically, so that they can be used in various computer applications simultaneously and with equal facility. Computer programs using Unicode character encoding to represent characters, but which do not display or print text, can remain unaltered when new scripts or characters are introduced. New computer operating systems are beginning to support these comprehensive 16-bit code standards, e.g., WINDOWS NT, manufactured by Microsoft Corporation of Redmond, Washington.
Often, it will be desirable to encode and decode character strings consisting of a combination of romance language data characters, including Arabic numerals, and data characters, including numerals, from Asian languages. Since the 16-bit character sets such as Unicode, GB and JISC-6226-1983, usually contain the one byte character sets (8-bit, 7-bit), such as extended or full ASCII, as subsets thereof, each of the romance and Asian language data characters can be represented by a double byte data character code selected from one of the 16-bit character sets. The double byte data character codes can be directly encoded into a memory of a data carrier, such as an RFID, optical tag or touch memory device. It is also possible to decode data from the memory, producing a set of double byte data character codes representing the information stored in the data carrier.
The substitution of double byte character codes for single byte data characters and the subsequent encoding of the substituted double byte character codes into a data carrier can use significantly more memory than would otherwise have been used had all of the single byte data characters been directly encoded into the data carrier as single byte character codes. Memory use is critical in many automatic data collection applications, especially when RFID or optical tags are employed. Therefore, there is a need to more efficiently represent and store a combination of data characters selected from single byte and double byte character sets in data carriers and other memory devices.
Under one aspect, a 6-bit character set 93i is defined to efficiently encode data characters. The character set employs single character codes and combinations of character codes to represent the 65,535 data characters of the 16-bit character sets. For example, the 93i character set includes a number of single character codes corresponding to xe2x80x9cnativexe2x80x9d data characters and functions. The 93i character set further includes multiple shift characters to encode up to 128 data characters using pairs of character codes. The 93i character set also includes Extended Channel Interpretation, Numeric Mode, Word Mode, and Byte Mode compression schemes that employ character code combinations to encode the remaining data characters (at least 65,535), and that can result in even more efficient packing of data in a data string. Such compression schemes employ a set of variable values as part of an equation or equality. Efficient data packing can provide a number of benefits such as reducing the amount of memory required to store a given amount of information, or reducing the time required to transmit the information.
Under another aspect, a string of n-bit character codes is converted into a string of m-bit character codes, where m is less than n. For example, the n-bit character codes can be from a 16-bit character set (e.g., Unicode, GB, JISC-6226-1983), while the m-bit character codes are from a smaller character set (e.g., the 6-bit character set 93i). Customized m-bit character sets can be created for different applications, the customized m-bit character sets assigning single m-bit character codes to the data characters that predominate in the character strings (xe2x80x9cnativexe2x80x9d data characters) common to the application. Thus, a significant proportion of the data characters represented by n-bit character codes can be represented by single m-bit character codes, resulting in a more effective packing of data in the string. This can be particularly advantageous where an external computer system employs a character set such as a 16-bit or 8-bit character set and the data must be stored in a relatively small amount of memory such as an RFID tag.
Under a further aspect, a string of m-bit character codes is converted into a string of n-bit character codes, where m is less than n. For example, a string of 6-bit character codes stored in a data carrier can be converted into a string of 16-bit character codes for use with a host computer operating with a 16-bit character set.
In yet an additional aspect, a writer such as an RF interrogator can encode the resulting m-bit character codes into a data carrier, such as an RFLD tag. Similarly, a reader such as an RF interrogator can decode m-bit character codes from the data carrier, creating an n-bit string of character codes corresponding to the data encoded in the data carrier. Thus, both romance and Asian language characters can be stored, read and manipulated in an efficient manner and the data can be used across a variety of platforms employing character sets of different sizes.
Additionally, the present invention embodies a computer-readable character set including functional data characters. These data characters, embedded in the character set, can cause a processor to execute a functions to control reading, writing, and/or manipulation of data. For example, the data characters can indicate a compression scheme for encoding or decoding a string of characters, such as Byte Mode, Numeric Mode, and Word Mode compressions schemes. Likewise, a Special Features Flag data character that can indicate the existence of a related or companion data carrier, typically located adjacent or near to the current data carrier. The Special Features Flag can additionally indicate a function or operation for a reader to perform.