Applications such as text searches and natural language processing may require tokenization of text documents. The tokenization of multilingual text documents may be particularly challenging.