1. Field of the Invention
The present invention generally relates to data centric applications. Specifically, the present invention provides a system and method for bit streaming in data centric applications.
2. Related Art
The JCR (Java Content Repository, JSR 170: http://jcp.org/en/jsr/detail?id=170) is gaining momentum in the Web based content management system (CMS). A JCR is a Content Repository API for Java (JCR) and is a specification for a Java platform API for accessing content repositories in a uniform manner. The content repositories are used in content management systems (CMS) to keep the content data and also the meta-data used in CMS such as versioning meta-data. The specification was developed under the Java Community Process as JSR-170 (Version 1) and as JSR-283 (Version 2). The main Java package is javax.jcr. Applications based on the JCR are easily customizable with JCR node types for modeling, and are encapsulated of diversified back-end database systems by the higher level JCR API.
All of the data stored in the JCR are represented with XML node schema and can be serialized to the file system. The Extensible Markup Language (XML) is a general-purpose markup language. http://en.wikipedia.org/wiki/Xml-_note-0 It is classified as an extensible language because it allows its users to define their own tags. Its primary purpose is to facilitate the sharing of structured data across different information systems, particularly via the Internet. It is used both to encode documents and serialize data. In the latter context, it is comparable with other text-based serialization languages such as JSON and YAML. (JSON (JavaScript Object Notation) is a lightweight computer data interchange format. It is a text-based, human-readable format for representing simple data structures and associative arrays (called objects). The serialized node file contains the data properties and their path information. More information may be found at json.org. YAML (“YAML Ain't Markup Language”) is a human-readable data serialization format that takes concepts from languages such as XML, C, Python, Perl, as well as the format for electronic mail as specified by RFC 2822. More information may be found at YAML.org.) Processing the serialized XML node files with higher performance is critical for enterprise data centric applications such as archive and restore and migration. The XML node files can be large because the JCR can store actual binary content within the XML nodes. It is not uncommon to have tens of megaytes (MB) of JCR XML node files for a typical repository.
There is no efficient XML processor that can handle large XML files with the ability to both parse and manipulate the states of XML elements such as DOM or SAX. One such processor might be good in parsing only (SAX) but not sufficient in maintaining states or vice versa with prohibit memory consumption (DOM). The Document Object Model (DOM) is a platform- and language-independent standard object model for representing HTML or XML and related formats. A web browser is not obliged to use DOM in order to render an HTML document. However, the DOM is required by JavaScript scripts that wish to inspect or modify a web page dynamically. In other words, the Document Object Model is the way JavaScript sees it is containing HTML page and browser state.
The Simple API for XML (SAX) is a serial access parser API for XML. SAX provides a mechanism for reading data from an XML document. It is a popular alternative to the Document Object Model (DOM).
As there is no efficient XML processor for handling large XML files with the ability to both parse and manipulate the states of XML elements such as DOM or SAX, there is a need for a new technique to improve the performance of JCR binary streaming in data centric applications.