Systems and methods for importing data from multiple electronic files can be relatively straightforward in some situations. In one example scenario, a conventional importation system identifies common fields in a set of electronic files that include data in a similar format and layout. The fields can be isolated using filtering functions of the system's data importation software and the desired information retrieved. The isolated data can then be aggregated so as to provide a report including all the records that together constitute the desired information.
One problem arises when conventional data importation systems receive electronic files including fields that lack commonality or differ within a given electronic file set. For example, spreadsheets received from different Banks that include wire transfer bank transaction data may include data fields that are arranged or configured differently. As another example, the data included in common fields (e.g., transaction amount) within a set of electronic files may be presented in different formats (e.g., dollars, thousands of dollars, Euros, CAD, 12-hour time, 24-hour time etc.). These problems intensify when large numbers of electronic files (e.g., millions of electronic files) are received by conventional data importation systems.
One solution to this shortcoming is to have an engineer write a new data importation software algorithm for each electronic file with a unique layout. This solution, however, is time consuming and expensive because a data importation system may receive hundreds of unique file layouts from thousands of different organizations.
Another shortcoming arises when a user imports electronic data files into a data analysis system using a conventional data importation system. Data analysis systems allow users to explore and manipulate data that has been imported and integrated into a coherent data model by a data importation system. For example, a data analysis system may allow users to visualize relationships, test hypotheses, and discover connections from data imported from numerous (and disparate) data sources. Conventional data importation systems may not, however, provide access to original source electronic data files from which data has been imported to one or more data analysis systems. As a result, data analysis systems may be unable to identify original source electronic data files and provide access to, or the ability to download, original source electronic data files.
Conventional data importation systems may also have shortcomings with handling importation of electronic files into multiple data analysis systems. For example, a first data analysis system may allow users to modify, tag, and change electronic data files that have been up imported into the first data analysis system and a second data analysis system. Conventional data importation systems may be unable to track the changes made to the copies of the electronic data files in the first data analysis system and update the copies of the electronic data files in the second data analysis system with those changes.
Conventional data importation systems may also have scalability issues when handling importation of a large number of electronic files. One scalability issue involves tracking the status of each electronic data file. For example, the conventional data importation system may not have the capabilities to keep track of which electronic data files have been imported, which electronic data files have been modified (or have modified metadata), and which electronic data files have been deleted.
Another shortcoming of conventional data importation systems arises with managing customization of data importation systems. For example, an engineer may write a first data importation software algorithm for a first instance of a conventional data importation system and may want to deploy that algorithm for one or more additional instances of the conventional data importation system. Any customizations to the deployed instances of the conventional data importation system may cause incompatibilities with future updates applied across deployed instances of the conventional data importation system. As a result, the engineer may need to manually resolve issues with conflicting customizations each time an update is to be applied.
A further shortcoming arises when a user wants to delete an electronic data file and any data (or transformed electronic files) imported into a data analysis system by a conventional data importation system. A user of a data analysis system who may want to delete certain data from the data analysis system may be unable to do so because the user cannot identify the original source electronic data file from which the data was imported. Moreover, a user may be unable to delete electronic data files on multiple data analysis platforms because the user cannot identify the original source electronic data file from which the data was imported.