Modern businesses run on data. However, despite the importance of data, and in particular the suitability of data for various analytic and processing functions, information is often available only in forms which are neither optimized nor suitable for anything more than storage. For example, payroll, expense or other types of records may be provided to a company for ingestion in the form images which would require significant processing before the information they depict could be used in any further processes. While the ubiquity of this situation has led to various approaches to extracting and organizing information from images or other types of unstructured documents, approaches currently in use have significant drawbacks. For example, document processing software that attempts to extract and organize information based on whitespace between items can fail for documents that include internal headers. Accordingly, there is a need for technology that is able to extract information from unstructured documents and organize it in a structured manner and that does not fail for documents with internal headers or in other cases that have proven problematic for solutions currently in use.