In a variety of environments, digital documents have replaced printed paper documents as a mechanism for delivering information to individuals. Although providing information digitally is a convenience for many individuals and a cost-saving mechanism for many organizations, the large and increasing amount of digitally available information creates scaling problems for traditional data extraction and processing methods. Techniques such as optical character recognition (OCR) and intelligent character recognition (ICR) have been developed to convert image data to text data by analyzing an image of a document. However, OCR and ICR currently do not adequately provide semantic information regarding the extracted text. For example, OCR or ICR may determine that a particular word is present in a document, but traditional OCR and ICR may be unable to determine the meaning, context, and/or significance of the particular word within the document.