1. Field of the Invention
The present invention relates in general to the field of computers and similar technologies, and in particular to software utilized in this field. Still more particularly, it relates to a method, system and computer-usable medium for identifying unchecked criteria in unstructured and semi-structured data within a form.
2. Description of the Related Art
Paper-based forms, and their electronic equivalents, are commonly used by government, commercial and private entities alike to collect a wide variety of information. While individual forms may be unique, they typically include a variety of questions that have associated checkboxes that can be marked in various ways, blank fields to be populated with input data, or a combination of both. As with the collection of any kind of information, certain types, formats, or ranges of information are expected for certain fields. For example, a form used for tracking a delivery may include fields for “arrival date” and “arrival time,” which would be respectively completed with a valid date and time of day.
Likewise, it is expected that certain rules or guidelines need to be adhered to when completing a form. If the rules are followed properly, then all pertinent checkboxes are marked, blank fields are appropriately populated, and complete and accurate information can be collected from the form. However, it's not uncommon for a person to inadvertently fail to complete a form for any number of reasons. For example, it may be unclear that certain checkboxes or fields are associated with a particular question on the form. As another example, the person may simply have not understood that one or more checkboxes must be marked or that certain blank fields must be filled out. As yet another example, the form may even have sections of text that includes questions that have no obvious checkboxes or blank fields. It will be appreciated that many hours or even days may have passed by the time these omissions are discovered, making it difficult to collect all of the information needed to properly provide associated goods or services.
These issues are often exacerbated by the fact that checklist form data may be multi-dimensional. That is, some text may be checked and some may not. Furthermore, text criteria spans that are checked may need to be handled differently than those that are unchecked. Moreover, they typically need to be handled differently when processed by a knowledge-based system, such as Watson™, available from International Business Machines (IBM™). For example, it may not be desirable to have text alignment, term/n-gram matchers factoring in unchecked text spans. Yet at the same time, the text cannot be simply ignored either, as it may signify a negation or otherwise contribute to identifying the correct or best answer to a question in the form.