Current information retrieval techniques deal with word-based retrieval mechanisms. This is especially true for textual documents. While word-based retrieval mechanisms perform fairly well for highly-focused searches which query a text document for a limited number of keywords, word-based retrieval may overlook diffuse sources of information embedded in a document, such as tables.
Along with providing a source of information in textual documents which word-based retrieval techniques find difficult to tap, tables embedded in a textual document often contain regularity which may be used to augment the capabilities of word-based retrieval systems. Information retrieval tools which are able to exploit cues given by embedded structure within a document can provide users additional power and flexibility in query specification.
Recent techniques for incorporating cues from structural regularities in a document focus primarily on extracting tables from documents. However, these techniques do not distinguish between various table components and are, therefore, unable to allow structured data queries on fields in the table.
Other similar techniques exist in the automated document structuring art, especially bit-mapped images. However, these techniques are primarily concerned with detecting structure in image documents and using the detected structure for traditional image processing and pattern recognition tasks.