Characters such as emoji, emoticons, letters, symbols, punctuation marks and other characters are typically represented using numerical codes in order to be dealt with using computers. Various different encodings are available although Unicode encodings are in widespread use and Unicode has become a dominant scheme for digital processing and storage of text.
Unicode provides a unique code for a character according to the Unicode standard as developed by the Unicode Consortium. The latest version of the Unicode standard is version 9.0.0 and comprises a core specification together with code charts, Unicode standard annexes and a Unicode character database. Rather than mapping characters directly to numbers in byte form, Unicode defines what characters are available (in the character database), corresponding natural numbers (referred to as code points), how those numbers are encoded as a series of fixed-size natural numbers (code units) and how the units are encoded as a stream of bytes.
Generally speaking, when a computing device interprets Unicode (or other character codes) in order to render one or more characters it typically does so using a font available at the computing device where the font is a mapping from Unicode code points (or other character codes) to glyphs. A glyph is a graphical representation of a character code or a sequence of character codes. An example of a glyph is the letter “a” in a particular graphical style, or the letter “a” in a different graphical style. An example of a glyph mapped from a sequence of character codes is “á” where the sequence of character codes comprises a character code for the letter “a” and a character code for the accent character.
Where a character code (such as a Unicode code point) is unsupported by an electronic device the electronic device converts a code representation of the character to an incorrect glyph. An incorrect glyph is a graphical representation of a character which is not the same as the graphical representation of the character which was intended by an encoder of the character code. The incorrect glyph may be a default glyph used by the electronic device when it cannot determine the correct glyph, or may be a glyph with a different form from the intended glyph.
Because unsupported character codes lead to errors in storage, communication and processing of character data there is an ongoing need for ways of accurately detecting unsupported character codes. In some cases a user may be unaware that a computing device has processed an unsupported character code and this leads to confusion on the part of the user who does not understand the state of the computing device.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known unsupported character code detection mechanisms.