The digital world has given rise to the rapid growth and expansion of data that is generated, stored, analyzed, and used by a variety of entities including companies, organizations, universities, and individuals. Data is continuously being generated and organized into documents by millions of users and their devices, such as mobile devices, computers, wearable devices, point of sale terminals, navigation devices, and a multitude of sensors stored thereon.
Often, data is compiled, aggregated and/or stored in print-ready digital source documents of file types such as XPS, RTF, PDF and the like. Print-ready digital source documents typically include a multitude of unstructured, semi-structured and/or structured data that is distributed onto fixed locations of a rendered page, rather than into organized lines, rows, cells, or the like. In other words, data on print-ready digital source documents is not organized relative to each other, but is instead fixedly arranged with relation to coordinates of a rendered page.
Print-ready digital source documents are used (e.g., generated, transmitted, stored) in just about any conceivable context or industry, including government, healthcare, education, retail, manufacturing, financial services, telecom, and the like. Print-ready digital source documents are used, for example, to store information, fix information onto rendered pages, printing information, and send information without risking that information being displaced throughout the pages of the document.
The data in the print-ready digital source documents is difficult to access because it is arranged in a non-tabular format, which does not enable it to be easily selected, sorted, modified, charted, and the like. One common theme among entities and individuals generating and using print-ready digital source documents is the desire to make their data more easily accessible, for example, so that it can be analyzed, filtered and used to efficiently and effectively generate tables. This, in turn, makes print-ready digital source document data easier and quicker to consume (e.g., to generate tables), less prone to errors, and more reliable.
There is a need, therefore, for systems and methods that allow for print-ready digital source documents files containing tabular data to be used to generate tables, spreadsheets, and the like. There is also a need for systems and methods that identify relationships between data, classifies data, and aggregates portions of data based on perceived relationship between them. Moreover, there is a need for such systems and methods to be executed with minimal user interaction.