The present invention relates generally to the field of processing XML files, and more particularly to splitting large XML files and processing elements in parallel on multiple computing nodes.
Extensible markup language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. XML is defined in several free open standards, and uses a textual data format primarily focused on documents, however, it is frequently used in web services. A “well-formed” XML document or file, is one that adheres to the syntax rules specified by a current XML specification, in that it must satisfy both physical and logical structures. The simplicity and usability of XML files contribute to a preference of use in transferring data over the Internet.
The number of XML formatted files, or messages, has significantly increased over recent years, especially in enterprise organizations. In addition to the number of XML files, the size of XML files has also increased significantly, creating issues related to processing and memory capacity to handle large numbers of very large files. In some cases, XML has become a “de facto” standard for transferring data from one step to the next, in business processes, or in data transfers between applications. Although some solutions have been offered, challenges related to central processing unit (CPU) and memory usage remain.