Many critical systems rely on the persistent accumulation of data. However, these critical systems lack fail safes to monitor data streams in order to ensure that the incoming data streams are not corrupted or otherwise perturbed. That is, incoming data streams may change. This may be due to a change in the input or a problem with the input source. The systems in place to review, analyze, and classify the text contained in data streams will continue to work; however, the analysis will be wrong. This problem of detecting corrupted, perturbed, or changing text is further complicated by the unstructured nature of certain text files, such as log files. Likewise, various image capture systems and object character recognition systems may alter text when digitized. Accordingly, there is a problem with existing systems detecting when input text deviates from the expected and notifying administrators of such deviations. Furthermore, it can be difficult to differentiate between a single anomalous input and a fundamental shift or change in the stream itself.
Aspects described herein may address these and other problems, and generally improve the quality, efficiency, and speed with which systems detect deviations in text streams.