Bar code symbologies were first disclosed in U.S. Pat. No. 1,985,035 by Kermode and expanded shortly thereafter in the 1930's in U.S. Pat. No. 2,020,925 by Young, assigned to Westinghouse. These early symbologies were printed by generating a multiplicity of single width elements of lower reflectance, or "bars," which were separated by elements of higher reflectance, or "spaces." An "element" is a bar or space. These early symbologies, and many "bar code symbologies" used today can be referred to as "linear symbologies" because data in a given symbol is decoded along one axis or direction. Symbologies such as linear symbologies encode "data characters" (e.g., human readable characters) as "symbol characters," which are generally parallel arrangements of alternating bars and spaces that form unique groups of patterns to encode specific data characters. "Data characters" include not only human readable characters, but also include special function characters such as start, stop or shift characters that provide certain functional data. Each unique group or pattern of bars and spaces within a predetermined width defines a particular symbol character, and thus a particular data character or characters.
The known U.P.C. symbology can be described generically as a (7,2) "n,k code." An "n,k code" is defined as a symbology where each symbol character has "k" number of bars and spaces and whose total length is "n" modules long. Therefore, the U.P.C. symbology encodes two bars and two spaces in each symbol character and each symbol character is seven modules long. A "module" is the narrowest nominal width unit of measure in a bar code symbology (a one-wide bar or space). "Nominal" refers to the intended value of a specific parameter, regardless of printing errors, etc. Under common counting techniques, the number of possible symbol characters can be found by realizing that in seven modules, there are six locations where a transition can occur, and that for two bars and two spaces, there are three internal transitions. Therefore, the number of unique symbol characters for the U.P.C. symbology is simply 6 choose 3 which equals 20. Similarly, under the Code 128 symbology, which is an (11,3) symbology, 252 unique symbol characters are available (10 choose 5).
The bar code symbologies known as U.P.C., EAN, Code 11 and Codabar are all bar code symbology standards which support only numeric data characters, and a few special characters such as "+" and "-". The U.P.C. symbology is both a bar code standard, as well as an industry standard, in that it has been adopted by industry in a standard application (consumer goods). The bar code standard Code 39 was the first alphanumeric bar code symbology standard developed. However, it was limited to 43 characters.
Code 93 is an improvement over Code 39. Code 93 is a continuous bar code symbology employing four element widths. Each Code 93 symbol has nine modules that may be either black or white (either a bar or a space). Each symbol in the Code 93 standard contains three bars and three spaces (six elements), whose total length is nine modules long. Code 93, having nine modules and three bars per symbol is thus a (9,3) symbology which has 56 possible characters (8 choose 5). The Code 93 symbology standard defines only 48 unique symbols, and thus is able to define 47 characters in its character set plus a start/stop code. The 47 characters include the numeric characters 0-9, the alphabetic characters A-Z, some additional symbols and four shift codes.
The computer industry uses its own character encoding standards, namely, the American Standard Code for Information Interchange (ASCII). ASCII defines a character set containing 128 characters and symbols. Each character in ASCII is represented by a unique 7-bit code. Since Code 39 and Code 93 are limited to fewer than 50 characters, these standards are inadequate to uniquely represent each ASCII character. The four shift codes in Code 93, however, allow this standard to unambiguously represent all 128 ASCII characters. One drawback is that a series of two Code 93 symbols are required to represent a single ASCII character. Thus, bar code labels representing characters in the ASCII character set are twice as long as labels representing characters in the Code 93 character set.
New bar code symbology standards, such as Code 128, were developed to encode the complete ASCII character set, however, these standards suffer from certain shortcomings, including requiring shift codes or other preceding symbols to represent certain characters. All of these symbologies require increased processing time and overhead to process the entire ASCII character set.
The computer industry has grown beyond the limits of the ASCII character set. As the computer markets have grown, the need has also arisen to support additional languages not defined by the ASCII character set. New character sets were developed to accommodate clusters of characters in related languages. The original 7-bit ASCII character set was expanded to 8 bits thus providing an additional 128 characters or data values. This additional 128 set of data values (the "upper 128" or "extended ASCII") allowed for additional characters present in the related romance languages (i.e., French, German, Spanish, etc.) to be represented. The only linear symbologies capable of encoding 8-bit data are Code 128, and "Code 53", which is described in the inventor's U.S. Pat. No. 5,619,027, entitled "Single Width Bar Code Symbology With Full Character Set Utilizing Robust Start/Stop Characters and Error Detection Scheme." Both Code 128 and Code 53 encode 8-bit data by using single or double function shift characters, and thus require increased processing time and overhead, since every byte value must be analyzed before a data character is encoded.
As the computer markets grew internationally, however, even more languages were required to be included in the character set. Particularly, the Asian markets demanded a character set, usable on computers, which supported thousands of unique characters. To uniquely define each of these characters, a 16-bit encoding standard was required.
Several 16-bit encoding standards such as Unicode, Big Five, GB, JISC-6226-1983, and others have recently been developed. The Unicode character encoding standard is a fixed-length, uniform text and character encoding standard. The Unicode standard may contain up to 65,536 characters, and currently contains over 28,000 characters mapping onto the world's scripts, including Greek, Hebrew, Latin, Japanese, Chinese, Korean, and Taiwanese. The Unicode standard is modeled on the ASCII character set. Unicode character codes are consistently 16 bits long, regardless of language, so no escape sequence or control code is required to specify any character in any language. Unicode character encoding treats symbols, alphabetic characters, and ideographic characters identically, so that they can be used in various computer applications simultaneously and with equal facility. Computer programs using Unicode character encoding to represent characters, but which do not display or print text can remain unaltered when new scripts or characters are introduced.
New computer operating systems are beginning to support these comprehensive 16-bit code standards, e.g., WINDOWS NT, manufactured by Microsoft Corporation of Redmond, Wash. The data collection industry, however, has generally failed to keep pace with the computer industry. Only a few systems have currently been proposed for encoding the 16-bit computer character codes into bar code symbols. One of the systems, named 93i, is the subject of U.S. Pat. No. 5,557,092, and the applicant's currently pending and commonly assigned patent applications Ser. No. 08/842,644 filed on Apr. 16, 1997, and Ser. No. 08/914,324, filed on Aug. 19, 1997, which are incorporated herein, by reference, in their entirety.
Often, it will be desirable to encode and decode symbols consisting of a combination of romance language data characters, including Arabic numerals, and data characters, including numerals, from Asian languages. Since the two byte character sets such as Unicode, GB and JISC-6226-1983, usually contain the one byte character sets, such as ASCII, as subsets thereof, each of the romance and Asian language data characters can be represented by a double byte data character code selected from one of the two byte character sets. The double byte data character codes may be directly encoded into a symbol in a symbology capable of supporting a double byte character set, such as 93i. It is also possible to decode the symbol, producing a set of double byte data character codes representing the information stored in the symbol.
The substitution of double byte characters for single byte characters and the subsequent encoding of the substituted double byte characters into symbol characters results in symbols with lengths which are longer than would otherwise have been achieved had all of the single byte characters been directly encoded into symbol characters. Symbol length is critical in many bar code applications, especially when linear symbologies are employed. Therefore, there is a need to allow bar code symbols to more efficiently represent a combination of data characters selected from single byte and double byte character sets.