Characters can be encoded in various ways for use by computers. The American Standard Code for Information Interchange, or ASCII, is a character encoding standard widely used in database systems today. Another character encoding standard, Unicode, has a broader application than ASCII because it can handle text expressed in most of the world's writing systems. However, Unicode text processing is more costly than ASCII text processing. The costly nature of Unicode is particularly problematic when operations involve predominantly ASCII-based text, such as most operations in database systems (e.g., performing a hash join on text columns).
Today, UTF-8 is the dominant form of encoding for Unicode because it maintains full backward compatibility to ASCII. Specifically, each of the ASCII characters in the range 0x00 to 0x7F is a single-byte UTF-8 character. All other characters in UTF-8 use two, three, or four bytes.