Embodiments of the disclosure relate generally to information processing, and more particularly, to the merging of data files in a computer system.
In the field of information technology, a the or data set is a collection of logically related data and can be a source program, a library of macros, or a file of data records used by a processing program. Data records are the basic unit of information used by a processing program. Data in a file may be processed and saved into the same the or a different file. A file may be stored on a secondary storage device, such as a DASD volume or a magnetic tape volume, and its contents may be printed to a printer or displayed on a terminal. Data in a file may be maintained in the form of records where each record may be of, for example, 80 bytes in length.
In processing data files, an application program or a file processing program may need to combine or merge multiple files into a single file. In current practice, a program may need to first open the files from which data is to be merged, using a data access method such as VSAM. Assuming that the program is merging data from a first file into a second file, the program may perform a GET operation for each data record in the first file to obtain the data record, and then perform a PUT operation to add the data record being processed to the end of the second file. The program would repeat the GET and PUT operations for each record in the first file until it reaches an end-of-file marker in the first file. Such a process would be very time-consuming, in particular for large files with millions of data records.
Further, current file merging approaches may include a caching of the merged data, which may require setting up data structures in an internal system storage and some types of locking to serialize the files being merged. These tasks all add to the overall processing time that a program would need in order to complete the merging of the files.
There is thus a need for a more efficient method and system for merging files or data sets.