1. Field of the Invention
The invention relates generally to a method, system, and apparatus for embedding hidden content within Unicode using non-printable Unicode characters and using the hidden content to perform a particular action.
2. Description of the Related Art
The Unicode Standard (“Unicode”) is an international coding standard intended to create uniformity across different platforms, programs, languages, and scripts. Prior to the implementation of Unicode, there were hundreds of different coding systems that assigned numbers to various letters and other characters. This led to conflicts between different coding systems. For example, if a message using one coding system was transmitted to another computer using a different coding system, the message would likely not be translated correctly because of inconsistencies between the coding systems. Unicode was intended to solve that problem by providing a unique code point for every character, regardless of the language or script. By providing a unique code point for every character, Unicode helps ensure uniformity for the transmission of messages between different computer systems.
Unicode presently contains more than 110,000 characters covering over 100 scripts and multiple symbol sets. These include, for example, a Basic Latin (ASCII) script, which covers much of the English alphabet and commonly used punctuation characters. In Unicode, each character is mapped to a specific code point. For example, the English uppercase letter “A” would be code point U+0041 in Unicode. A comprehensive listing of the Unicode Code Charts may be found at the official Unicode website at www.unicode.org/charts. Multiple Unicode points can be used to form a Unicode string, which is then embedded into a standard character encoding format, such as Universal Transformation Format-8-bit (“UTF-8”).
In reference to FIG. 1, an example of system too for transmitting a basic Unicode message across a network using the UTF-8 encoding format is shown. The system includes a client computer system 101, which includes a keyboard 102, monitor 104, and desktop computer 106. The desktop computer 106 has memory 108, which stores, among other things, the Unicode Standard no and messages input into the computer using the keyboard 102. The computer system is connected to a mobile communication device 112 via the internet 114. The mobile communication device 112 also has memory 116 which stores, among other things, the Unicode Standard no and messages received from other communication devices. The computer system 101 and mobile communication device 112 may communicate with each other via a third-party application, such as Facebook®. A user of the computer system 101 may send a message (i.e., “Hello”) 118 to the mobile communication device 112 using the keyboard 102. When the message 118 is entered, it is stored into memory 108 and encoded using the UTF-8 encoding format 120. The encoded message 122 is then transmitted to the mobile communication device 112 via the internet 114. When the encoded message 122 is received by the mobile communication device 112, it is stored into memory 116 and then decoded so that it appears as the message “Hello” in the Facebook® application of the mobile communication device 112. While the present example illustrates the transmission of an English message between communication devices, those of skill in the art would appreciate that Unicode can be used to transmit messages in numerous other languages or scripts.
While Unicode includes a large number of characters covering various scripts and symbol sets, Unicode also includes reserved Unicode code points labeled as “private-use characters” that may be defined by a user of the Unicode standard. A user may define the private-use characters to be any character the user desires, including custom made characters not already included within the Uniform Standard. For instance, the Unicode code point U+E000 is a private-use character, which a user may define to be a custom flower symbol not in the standard Unicode character set. However, in order for that code point to be properly viewed by the recipient of the Unicode message containing the private-use character, the recipient must also have the private-use character mapped in the Unicode Standard files on the recipient's device. If the private-use character is not mapped in the recipient's Unicode Standard files, then the recipient's device may ignore or disregard the U+E000 code point, rather than displaying it. In other words, the recipient may not even realize that the private-use character is embedded in a message if the private-use character is not mapped in the recipient's Unicode database. For instance, in reference to FIG. 2, a Unicode message 202 comprising character mappings 204 corresponding to the word “Hello” is shown. “H” corresponds to the code point U+0048 (208), “e” corresponds to the code points U+0065 (210), “1” corresponds to the code points U+006C (212), and “0” corresponds to the code point U+006F (214). Additionally, the private-use character U+E000 is included in the middle of the Unicode message (216), but that private-use character is unassigned (i.e., not mapped to any particular character in the recipient's Unicode database). When the message 202 is received by a recipient device 206 the viewable characters are those comprising the word “Hello.” The private-use character U+E000 remains invisible because it was unassigned, and as such the recipient device 206 ignores or disregards it.
In addition to the private-use characters, there are also control characters (e.g., U+0000 through U+001F), many of which that will also not appear when embedded into a Unicode message. Unlike private-use characters, these control characters are predefined by the Unicode standard. However, many of these control characters no longer have a purpose or use, and thus when they are transmitted they may also not appear as visible text to the user of a recipient device. Certain private-use and control characters are non-exclusive examples of non-printable characters. Any character that does not visually appear when transmitted as part of a Unicode message may be referred to as “non-printable characters.” For example, U+E000 (216) in FIG. 2 is an example of a non-printable character.
While Unicode works well for its intended purpose, the need exists for the ability to convey additional content in a Unicode message by taking advantage of the manner in which non-printable characters are handled by most computer systems.