Computer-based applications and services frequently rely on the receipt and transmission of electronic data to provide services to users. As computing has become increasingly ubiquitous in people's lives, the need for simple and efficient data transfer capabilities has likewise increased. In particular, this has meant an increase in the use of eXtensible Markup Language, or “XML” for short. By supporting self-describing data such as XML, applications are able to communicate more freely with each other and without requiring two applications to have common knowledge of file formats or data structures before communicating.
However, XML data must typically be parsed before the information contained therein is available to an application. An XML document may be complex, containing numerous levels of hierarchically-structured data and data descriptors. If XML parsing is performed in an inefficient matter, bottlenecks can occur, preventing faster operation of the application or service relying on the XML data.
Existing XML parsing techniques have attempted to speed up the parsing process by performing some parsing steps in parallel. However, even where existing systems manage to take advantage of parallel processing, many of these introduce additional complications which hinder the potential improvements of parallel processing. For instance, in some existing parallel XML parsing techniques, parsers which operate on separate pieces of XML data must account for and check dependencies between the pieces. This means that frequent communication must occur between parallel parsing threads, resulting in threads which must pause or slow down while waiting for communication replies. This communication overhead reduces much of the potential speed advantage of parallel XML parsing.
Similarly, existing parallel parsing techniques produce hierarchical output structures, such as those similar to a Document Object Model, or “DOM,” structure. In such a structure, structural links are oftentimes required between parent and child nodes. This requires additional communication overhead between parallel parsing threads when their parsed results are to be combined into such a structure. Again, this reduces parsing efficiency.
Finally, in existing parallel XML parsing techniques, XML data is not divided into pieces in an efficient way which also provides for subsequent parsing to be performed quickly and efficiently. Instead, in some techniques XML data is rigorously checked before parallel parsing is to be performed; while this prevents errors in some techniques, it provides yet another bottleneck to efficient processing. In other techniques, while XML data is quickly divided into roughly even chunks for parsing, this partitioning does not perform enough checking, and can result in a parallel parsing process performing unnecessary work, such as separating comment text from traditional data. This requires the parser to operate in a speculative manner and to communicate with other parsing threads, once again introducing unwanted communication overhead.