It is well known to generate indexes of files in order to efficiently search the content of those files. However, when the files contain non-English text and/or symbols, indexing and searching becomes more complicated. Accordingly, an efficient system and method is desired for building a full text index of files containing English text as well as non-English text, and which takes into account different script issues. The full text index should be structured in such a manner as to allow efficient and effective searching of the English and non-English text, including text containing diacritic symbols, special characters, and the like.