The present disclosure relates generally to parallel processing using memory mapping. Specifically, techniques disclosed relate to performing parallel processing on a file in conjunction with a memory mapping of the file.
In file integration scenarios, an input file may contain a large number of homogeneous records. A user specifies logic for a record processor, which processes each record in the file and transforms each of the records into another form. Currently, a record processor goes through each record in the file one-by-one and applies the transformation on each record. That is, bulk file-based integration is performed for large files. When a file is large, this process can take a long time before all of the records are processed.
FIGS. 1-21 illustrate an example of processing records serially. FIGS. 1-21 show 20 records (e.g., dated from Mar. 1, 2016-Mar. 5, 2016). The record processor 120 can contain logic for reading a native record from a file, converting the native record into an XML format using an input schema, applying processing logic specified by a user, converting the output XML record back into the native format (e.g., comma separated value (CSV)) using an output schema, and writing the converted native record to an output file.
As shown in FIGS. 1-21, the record processor 120 serially processes each of the records 110. For example, in FIG. 2 the first record is processed, in FIG. 3, the second record is processed, and so on until all of the records are processed. Serially processing records can be a time consuming, lengthy and inefficient process.
Alternatively, a large file can be physically divided into smaller files However, the initial breakdown can be expensive since each file would need its own disk space. Also, complicated synchronization would need to be performed. In addition, read and/or write system calls would be required that would involve switching from a user mode to a kernel mode, which can be expensive.