The state of the art for identifying the language of text documents involves the statistical analysis of the words and characters used in the entire document or sizable portions of the document. As such, the state of the art cannot identify the language of individual words in isolation, nor is it effective in identifying the language of documents that contain multiple languages, such as dual-language documents (e.g., Canadian parliamentary proceedings are printed in both English and French on the same page), or documents which contain short quotes of a foreign language or which occasionally use an isolated foreign language term.