The present invention relates to a method, apparatus, and computer program for performing type inference of serialization for each generation site and specializing a serializer for each generation site.
With the recent proliferation of Internet environments, improvements in processing speed of various computers has led to an increase in attention to big data, which is a massive volume of digital data. Big data is not merely massive in volume, but also unstructured and requires high real-time performance. In conventional database management systems, data is structured and stored, and later processed and analyzed. It is therefore regarded as difficult to handle big data whose properties are incompatible with conventional databases.
In the case of analyzing data using a NoSQL (Structured Query Language) language, (e.g., JAQL), for handling semi-structured data, such as JavaScript Object Notation (JSON), a proper response may not be obtained due to a bottleneck caused by the input and output costs of a large amount of data communication. Accordingly, various measures can be taken, such as unifying data processing using a type (schema) and also compressing data to reduce the data size before analysis.
For example, patent literatures including Japanese Patent Application Publication Numbers 2003-122730, 2003-122773, 2003-249961, 2005-056085, 2005-157718, and 2005-209048, Japanese Translation of PCT International Application Publication No. 2007-522558, and WO 2011/111532 disclose techniques of executing serialization and/or deserialization in the case of processing a massive volume of digital data. Mostly, serialization and/or deserialization disclosed in the patent literatures 1 to 8 are dynamically executed while checking types and values.
Japanese Translation of PCT International Application Publication No. 2007-519078 and Japanese Patent Application Publication No. 2010-237867 disclose systems that use type inference for inferring a type (schema). For example, by inferring a type of a record having values and names (identifiers), big data can be deserialized to data that can be actually handled.