Devices, typically suitable programmed computing devices, that perform automated document analysis are well known in the art. Such devices are often capable of performing content recognition or matching analysis and provide enhanced man-machine user interfaces in which matches of specific types of content in document text are displayed and highlighted. Ideally, the processing performed to implement such content matching will lead to few, if any, false positives and few false negatives (misses) that would otherwise lead to an inaccurate representation of the document text presented by such user interfaces.
Where multiple content matchers are executed against a given body of text, in order to identify different content types, the possibility exists that one or content matchers will attempt to identify the same or overlapping portions of the text as matching different content types. For example, a device may have a content matcher configured to identify instances of dates in the body of text, as well as a content matcher configured to identify instances of units of measurement. In this scenario, ideally, the phrase “On Jan. 1, 2000 mL of fluid was purchased,” would result in the identification of a date (“Jan. 1”) and a unit of measurement and accompanying value (“2000 ml”). However, if the date content matcher first analyzes this phrase, a match for “Jan. 1, 2000” will be identified. Consequently, the measurement unit content matcher will fail to identify “2000 ml” as a unit of measurement and accompanying value because “2000” was previously identified as an instance of a date.
Thus, content matching techniques that overcome these shortcomings would represent a welcome advancement in the art.