Tables that appear on web pages may be an important source of structured data. Several approaches may be used to extract tables from the web pages where they appear, harvesting raw HTML tables or lists from the Web and recovering their semantics. These approaches may typically focus on semantically annotating the tables for other uses, such as data visualization, search, and enriching knowledge bases.
A table may be fragmented, for example, with a single table broken up across several web pages. This may make utilizing the data in the tables after extraction more difficult, as extraction may create an individual table for each table fragment. To run a query against the fragmented table, the query may need to be run against each of the individual tables created by extracting the table fragments.
In some instances, tables spread across various web pages may be related. The tables may include similar data types, and may be parts of a fragmented table. Simply joining such tables together, even when they are related, may lead to a table that contains confusing and inconsistent data. Without knowing the context of the tables, it may be difficult to determine if related tables can be joined together to produce a consistent table.