This work was supported by Seoul Metropolitan City funded by the Korean Government (CI160025)
The present invention relates to a system and method for searching data, more particularly to a data search system and method that enable increased speed in searching large-volume time series data in a DBMS (database management system).
Generally, a time series DBMS employs techniques such as compressing data and applying indexes in real time to a bitmap for high-speed input and real-time indexing. A DBMS may also perform operations such as, among others, index-based browsing, data decompression, and search target conditional clause checking, for searches of large-volume data.
Also, if several disks are used (e.g. RAID) or if a high-speed permanent storage device is used (e.g. SSD) in operating a DBMS, the speed of the input and output to and from the disk is increased. Such increase in speed increases the operation efficiency of reading compressed data, and consequently, the CPU processing loads associated decompression and search target condition checks are increased as compared to the read speed for the disk. That is, the time expended in processing the data search may become greater than the time for reading and writing from and to the disk.
Also, the process of searching large-volume time series data in a real-time time series DBMS may typically use a sequential data access method or an indexed search based on bitmap indexing. Here, time series data inherently entails frequent occurrences of repeated data, and an indexing method based on unique keys would result in low search efficiency and incur a limit on the performance of the overall system when searching large-volume time series data.
Indexing methods based on bitmaps may be used to resolve the problem above. A bitmap-based indexing method can generate indexes very quickly, enabling quick maintenance of indexes for data inputted at high speeds albeit with slow updates, but since time series data does not perform update operations, such method can be applied to time series data at a preliminary level.
Even when such method is applied, however, it may not produce particularly useful results in cases where the amount of data actually inputted is very large or in cases where the number of search target records is very great. Another example known in the related art is the search method based on the B+ tree for an RDBMS, but this method is only efficient when there is only one search target record or when there is small amount of data and is inevitably highly inefficient for time series data having large amounts of repeated keys.
Also, a real-time compression technique may be used in order to store real-time data on a disk despite limited disk output speed, and a decompressing of the compressed data may be required for transmitting the search results. In this case, a high-speed transmission of data is possible with smaller amounts of disk space and data transmission speed as compared to original data, but an additional operation for decompression may have to be performed. That is, due to recent improvements in hardware performance, a bottleneck occurs not during disk reading operations but rather during operations such as decompressing or performing conditional clauses after the reading.
Existing technology is limited in searching large-volume real-time time series data. In particular, whereas improvements in search performance are also needed in step with the improvements in disk performance, there are no solutions being offered. Thus, there is a need for technological developments for effectively resolving the problems discussed above.