In documents, such as, financial reports, product documents, and scientific articles, for better interpretation of data, data is usually presented in the form of a table that includes multiple rows and columns. These table structures allow the owner of a document to present information in a structured manner and to summarize key results and main facts. Tables are also used by analysts for data mining, information retrieval, trend analysis and other such tasks.
As the data included in such tables is central to understanding the document, it is necessary in machine learning to read and understand table data, in order to use the document for further analysis. However, owing to large variability of table layouts, table styles, information type and format, and lack of availability of document encoding/formats information, it's a significant challenge to accurately identify and retrieve information from a table. Some examples of variability in tables may include, different heights of rows and columns, merging of cells, different number of column, different number of rows in different columns, or different types of borders distinguishing the cells.