The present invention relates generally to the field of document processing, and more particularly to analysis of ingestion of multi-formatted tables in a document.
In an unstructured information system, information sources are main component yielding analytical results. For many domains such as science, medicine, or finance, documents may contain complex tables with embedded textual content. Isolated tables may not be as valuable as tables in context. Table with associated contextual content may be difficult to process due to multiple formatting styles or other errors typically associated with, styling, Object Linking and Embedding (OLE) extraction or Optical Character Recognition (OCR) extraction. Ingestion of tables into unstructured information systems may be inefficient in both time and resources used.