With the rapid development of Internet of Things, social network and cloud computing technology, a large number of business applications have produced mass data, and data traffic has exponentially increased rapidly. The analysis and mining of these mass data to discover various laws and information contained in these data has become the research focus in the technical field of big data processing. The data includes static data and dynamic data. The dynamic data, also known as data stream, has the features of instantaneity, unlimitedness, temporality, velocity uncertainty and metadata infinity, etc. The traffic data stream is namely a typical dynamic data stream. For example, in order to analyze and control the traffic stream and relieve traffic pressure, a traffic monitoring system often pays close attention to the approximate summary information of moving vehicles on specific sections in a specific time period, for example, monitoring and analyzing the traffic volume during the rush hour of Xinjiekou in Nanjing City.
Traditional data query methods, such as skyline query, cannot be applied to dynamic data changes, and are ineffective for data stream queries. Therefore, it has developed an aggregate query method in the technical field of data query, which obtains statistical and summary information by scanning a large number of data tuples. However, due to the features of instantaneity, unlimitedness, temporality, velocity uncertainty and metadata infinity of the traffic data stream, it is difficult to perform aggregate query on the entire data set so as to obtain an accurate query result in a shorter time even with the cloud computing technology with parallel computing ability. Therefore, high-quality approximate aggregate query results are usually used to replace the accurate query results in practical applications. In the face of the gradually improved requirements of the industry on the query accuracy, a variety of approximate aggregate query methods are developed, such as sliding window technology, random sampling technology, wavelet technology, sketch index structure, histogram technology, etc. However, the advantages and disadvantages of an algorithm are measured by all the existing approximate aggregate query methods using an average query error, so that the accuracy of the existing approximate aggregate query method cannot effectively replace exact query.
Therefore, it is necessary to improve the existing aggregate query method and propose a new clustering query method to improve the accuracy of clustering query, so as to solve the technical problem that the existing approximate aggregate query method cannot effectively replace the traditional accurate query method, thereby effectively replacing the traditional accurate query method.