Unlike a conventional file system that supports an operation of modifying a file in place, an append-only file system is a file system that supports only a read or an append operation on a file. A file system of this type features high write performance, easy consistency among multiple copies, and the like, and is widely used in a large-scale distributed storage system. Typical examples include a GOOGLE file system (GFS), a HADOOP distributed file system (HDFS), and the like. Compared with conventional row-based storage, columnar storage in the append-only file system has distinct advantages. According to the columnar storage, data records are split on a column basis and independently stored such that data in a same column is of a same type is successively stored, which greatly increases a data compression rate and reduces data input/output (I/O) in a subsequent query operation. In addition, data in columns is separately stored, and during a data query, only data in a related column needs to be scanned and an unrelated column is directly ignored, which greatly improves performance of a query of this type.
According to the columnar storage in the append-only file system, newly added content is always appended to a tail of a file, which can be implemented only by overwriting, that is, an existing file is deleted, a new file is generated for a to-be-updated file, and file content is not allowed to be updated in place. A record columnar file (RCFile) is a data storage structure that can implement the columnar storage. The RCFile is designed and implemented on the basis of the HDFS. The RCFile organizes records per row group, where other row groups except the last row group have an equal size and each HDFS block can store multiple row groups. Inside each row group, data in columns is mutually independent and is successively stored, and metadata stores information, for example, location offsets of data of the columns and a byte length of each piece of data. Information about an Extensible Markup Language (XML) schema of a data table is stored in a third-party server, for example, MYSQL or DERBY. The Schema of the data table is stored in the MYSQL and therefore can be very conveniently modified using a HIVE of APACHE HIVE. However, the RCFile has a fixed file organization format, the metadata in the RCFile stores only simple information, for example, a quantity of data records and bytes of each column, but does not support any operation of dynamically updating data.
Therefore, according to the columnar storage in the existing append-only file system, an effective method for dynamically modifying metadata cannot be provided in an existing implementation manner. A metadata updating operation is highly costly because it requires regeneration and rewriting of all corresponding storage files, which results in extremely huge computing resource overhead and time consumption for large-scale data.