Many organizations routinely upgrade their computing systems and architecture. Over time, organizations may change computing platforms or infrastructures resulting in certain computing systems and technologies becoming outdated or obsolete. Computing systems and technologies that are considered outdated or obsolete are referred to as legacy systems. Computing systems and technologies that are not outdated or obsolete are referred to as non-legacy systems. While organizations may chose to replace these legacy systems with newer, more efficient non-legacy computing systems and technologies, an organization may also choose not to migrate their data from the older legacy system into the newer computing system. Instead, the organization may choose to archive or otherwise preserve the organization's existing data in the legacy system and then use the newer, more efficient computing systems for all future transactions and processing functions.
This hybrid approach of archiving the existing legacy system data may be done in part to satisfy legal record retention requirements, such as for tax or auditing purposes. Organizations may also decide that archiving the existing legacy system is more cost effective than attempting to extract and migrate the legacy system data into the newer, more efficient systems.
While it may be more cost effective in a broad overall sense for an organization to archive its existing legacy system data instead of attempting to integrate it into its newer computing systems, it also becomes more difficult to generate reports on the archived data from the legacy systems.
For example, the existing legacy system data may be archived in files that contain serialized objects in which data records from different tables are assembled into one object instance so that the file and data object are readable as a stand alone file without the need to access another file. Thus, multiple files may contain copies of similar data records so that each file can be read as a stand alone file. At the same time, each file may not necessarily store complete tables; for example, only some of the records of a single table may stored in a single file as those records in a table that are not relevant to the data object are not stored in that respective file.
While this data structure is more efficient for long-term record retention, it is extremely inefficient for running queries that do not match the internal structure of these archived data object files. For example, a query for searching each record in a table may be very inefficient since multiple files may contain copies of the same records though none of the files may contain a complete set of all records in the table.
Thus, aggregating, organizing, and/or generating reports on this archived data has been very inefficient and cumbersome. There is a need for a more efficient data retrieval from these archived data files in order to facilitate analytical reporting of archived data.