Common techniques for improving the performance of natural language processing systems include tokenization and normalization during a processing phase prior to performing semantic analysis. Tokenization splits input text into various segments, and normalization brings the format of those segments into alignment with some standard forms. By reducing the variability of input text, the amount of data a natural language processing system requires to recognize some concept is thereby reduced. As an example, normalization reduces the number of cases a developer must consider when writing text processing rules.
Current text processors, however, are often generic processors that are developed without knowledge of or regard to the context in which the natural language processing system operates.