The present disclosure relates generally to computer system file management, and, in particular, to automatic parsing of markup language documents.
A markup language (ML) provides a way to combine text and extra information about the text in a text file. The extra information can include data structure, layout, or other information, intermingled with the primary text. An ML can facilitate sharing of structured data across diverse information systems, such as the Internet. ML documents are typically files stored in a text-based format that define and describe information that can be interpreted by both humans and computers. Before an ML document can be consumed by an application, it must first be parsed into its semantic components. Once parsed, the consumer of the ML document knows the purpose and meaning of each item in the ML document. When an application or middleware needs to consume an ML document, the process requires two discrete steps—read, or acquire ML text in a buffer, and then call an ML parser to process the ML text into useful binary objects that can be consumed. The binary objects created by the ML parser are typically dynamic in nature, with the binary objects being created and held temporarily, and destroyed upon consumption. This means that an ML document must be repeatedly parsed every time a consumer reads it. This process adds complexity to processing that every ML document consumer performs, and drives up overall resource usage within a system, as multiple consumers handle ML documents.
Thus, every time an ML document is read, parsing occurs, which consumes significantly more resources than if the parsed information were generated once and kept available for multiple consumers as a persistent version of the ML document. Moreover, ML consumer applications are charged with the task of locating and calling a compatible ML parser, leading to further complications for the ML consumer applications when the ML parser is moved to a different access path or semantics are modified, such as upon a system reconfiguration or update. Accordingly, there is a need in the art for automatic parsing of ML documents at write time and returning the stored parsed version at read time.