Web browsers and other software applications may rely upon character-encoding standards in order to ensure that most, if not all, characters, including combining-character sequence-based characters such as ü, â, , , {hacek over (ü)}, and {tilde over (ô)} are predictably rendered across different client devices, which may have different versions of the browser or other software application running on top of different operating systems, which are themselves executing on different hardware configurations. One such character-encoding standard is Unicode. “The Unicode® Standard: A Technical Introduction” provides a concise introduction to Unicode, as shown in the following excerpt (see http://www.unicode.org/standard/principles.html):
The Unicode Standard is the universal character-encoding standard used for representation of text for computer processing. The Unicode Standard defines codes for characters used in all the major languages written today. Scripts include the European alphabetic scripts, Middle Eastern right-to-left scripts, and many scripts of Asia. The Unicode Standard further includes punctuation marks, diacritics, mathematical symbols, technical symbols, arrows, dingbats, emoji, etc. It provides codes for diacritics, which are modifying character marks such as the tilde (˜), that are used in conjunction with base characters to represent accented letters (n, for example). In all, the Unicode Standard, Version 6.0 provides codes for 109,449 characters from the world's alphabets, ideograph sets, and symbol collections.
The majority of common-use characters fit into the first 64K code points, an area of the codespace that is called the basic multilingual plane (“BMP”). There are sixteen other supplementary planes available for encoding other characters, with currently over 860,000 unused code points. More characters are under consideration for addition to future versions of the standard. The Unicode Standard also reserves code points for private use. Vendors or end users can assign these internally for their own characters and symbols, or use them with specialized fonts. There are 6,400 private use code points on the BMP and another 131,068 supplementary private use code points, should 6,400 be insufficient for particular applications.
Text elements are encoded as sequences of one or more characters. Certain of these sequences are called combining character sequences, made up of a base letter and one or more combining marks, which are rendered around the base letter (above it, below it, etc.). For example, a sequence of “a” followed by a combining circumflex “^” would be rendered as “â”. Certain sequences of characters can also be represented as a single character, called a precomposed character (or composite or decomposible character). For example, the character “ü” can be encoded as the single code point U+00FC “ü” or as the base character U+0075 “u” followed by the non-spacing character U+0308 “{umlaut over ( )}”. The Unicode Standard encodes precomposed characters for compatibility with established standards such as Latin 1, which includes many precomposed characters such as “ü” and “ñ”.
A single number is assigned to each code element defined by the Unicode Standard. Each of these numbers is called a code point and, when referred to in text, is listed in hexadecimal form following the prefix “U+”. For example, the code point U+0041 is the hexadecimal number 0041 (equal to the decimal number 65). It represents the character “A” in the Unicode Standard. As discussed above, a range of code points on the BMP and two very large ranges in the supplementary planes are reserved as private use areas. These code points have no universal meaning, and may be used for characters specific to a program or by a group of users for their own purposes. For example, a group of choreographers may design a set of characters for dance notation and encode the characters using code points in user space. A set of page-layout programs may use the same code points as control codes to position text on the page. The main point of user space is that the Unicode Standard assigns no meaning to these code points, and reserves them as user space, promising never to assign them meaning in the future. (end of excerpt)
A social-networking system, which may include a social-networking website, may enable its users (such as persons or organizations) to interact with it and with each other through it. The social-networking system may, with input from a user, create and store in the social-networking system a user profile associated with the user. The user profile may include demographic information, communication-channel information, and information on personal interests of the user. The social-networking system may also, with input from a user, create and store a record of relationships of the user with other users of the social-networking system, as well as provide services (e.g. wall posts, photo-sharing, event organization, messaging, games, or advertisements) to facilitate social interaction between or among users. The social-networking system may transmit over one or more networks content or messages related to its services to a mobile or other computing device of a user. A user may also install software applications on a mobile or other computing device of the user for accessing a user profile of the user and other data within the social-networking system.