Cloud computing is a product of development and integration of traditional computer technologies and network technologies, such as grid computing, distributed computing, parallel computing, utility computing, network storage, virtualization, and load balance, and is aiming to integrate multiple computing entities with relatively low cost into a system with great computing capability via networks. Distributed storage is one field in cloud computing category, and has effects of providing the distributed storage service of mass data and high-speed read and write access capability.
Data have schema structures. In a relational database, schema structures of data are maintained by a database. However, schema structures of data are not perceived in a non-relational data storage system. As a result, when data are stored, the data are converted into binary data streams according to the schema structure, and then are stored into a distributed storage system; when data are read, the binary data streams are taken out from the distributed storage system, and are restored into usable data according to a certain rule. The data conversion and restoration above are generally called as data serialization and deserialization processes.
In distributed applications, a writer of data and a reader of data are possibly not a same program. If a data schema structure never changes, the reader can restore the data of the writer correctly at any time according to the reader's own local data schema structure. But, with upgrade of programs, data schema structures are normally changed, while massive data are always stored in the distributed storage, and it is hard to read out and modify all the data schema structures in a short period of time; in addition, in certain scenes, it is required to keep no interruption of service is required to be kept in an upgrading process, and writers and readers of different versions would occur at the same time. At this moment, a key problem which is relatively difficult to solve is how a data reader restore data of any writer. A feature capable of restoring data of any writer is generally called as schema-free.
In relevant technical schemes, the schema-free is generally realized by following schemes.
Scheme 1: a field is fixed in a data schema structure to represent a version of data, but there is a shortage that newly increased content of a new version can only be increased to an end of an old version, and readers need to achieve restoration methods of schema structures of all known versions;
Scheme 2: a traditional tag-length-value method, in which the tag, length and binary data streams (value) of each field in a data schema structure are recorded sequentially, but there is a shortage that no complex data structure can be represented, especially a nested structure.
Scheme 3: a self-description notation, such as a JAVA script object notation (short for JSON). There is a advantage that any complex data schema structure can be represented, but there is a shortage that the self-description notation is an interpreted data description mode with low efficiency.
Currently, no effective solution has been proposed as to a problem that a data schema structure in a distributed storage system in related art is inflexible to be described.