The present invention relates to compression of JavaScript object notation (JSON) data, and more specifically to compression of JSON data using structure information.
JSON (JavaScript Object Notation) is an open standard format that uses human-readable text to transmit data objects consisting of attribute-value pairs. It is based on a subset of the JavaScript Programming Language.
JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others.
JSON is built on two structures:                A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.        An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.        
A JSON Schema specifies a JSON-based format to define the structure of JSON data for validation, documentation, and interaction control. A JSON Schema provides a contract for the JSON data required by a given application, and how that data can be modified. The JSON Schema can be used to validate JSON data. The same serialization/deserialization tools can be used both for the schema and data. The schema is self-describing.
There has been an increase in representing data in JSON across different domains from databases to web applications due to the simplicity and ease of representing data. Client side programming models rely on JSON transport between client and server for client side display. However, JSON documents tend to be quite large compared to other forms of data representation.
The JSON documents are quite large due to the fact that data must be converted for text based encoding, over-usage of quotes, and when multiple objects are serialized in the same message, key names for each property must be repeated, even though they are the same for each object. Additionally, common values or properties are also serialized.
A prior art solution used to overcome the size of the JSON documents is to transpose the JSON data and group together all the values for each instance of the specific key and list them in an array. Another solution is to represent the JSON data in binary form.
The disadvantages of the above solutions are that the techniques do not use the inherently well-defined structure of the JSON document to provide optimal compression.
U.S. Pat. No. 7,886,223, assigned to International Business Machines Corporation, discusses using a statistical tree for encoding and decoding an extensible markup language (XML) document. XML represents the structure of data before it is transported from one system to another. The XML Schema used to create the statistical tree supports complex types which allow the creation of a compression tree to be created for each complex type. Each compression tree is then used to compress the XML fragments pertaining to it.
The JSON Schema introduces complexities to the tree which are not present or representable in an XML Schema. Furthermore, JSON data does not support complex types, which XML does. Instead, JSON documents support lists or arrays which would have to be dealt with differently.