Today, information may be conveyed through various types of documents, such as text documents, images, portable document format (PDF) documents, web pages, scanned documents, spreadsheets, etc. Some documents may be arbitrarily formatted by an author, which may result in erroneous information when mining data from such documents. For example, an inspection report (e.g., an equipment inspection report) may comprise multiple regions that vary in how information is organized (e.g., a first region may list equipment temperatures along multiple rows, whereas a second region may list equipment locations down multiple columns). Thus, parsing the inspection report may not result in logical partitions of information, but may result in clusters of data that do not correspond to how the author organized information within the inspection report.