The present disclosure relates to a field of relational database, and in particular, to a method and device for performing predicate evaluation on compressed character string of variable length.
Recent study has shown when performing a query processing in a relational database, performance of the query processing can be greatly improved by performing predicate evaluation directly on compressed data stored in a database rather than decompressing the compressed data and then performing the predicate evaluation on decompressed data. The predicate sequentially including an argument with no function, expression or clause, an operational relation operator or a logical operator, a connection constant is usually called a simple predicate in the art, and called predicate for short. In the database language, the operational relation operators or logical operators can include “=”, “>”, “>=”, “<”, “<=”, “NOT”, “IN”, “LIKE”, “BETWEEN” or the like. Performing the predicate evaluation on data means determining whether the data satisfies the predicate, and if it does, the result of the predicate evaluation is true; otherwise the result is false.
The key for performing the predicate evaluation directly on the compressed data stored in the database is to keep order preserved in compression-encoding, that is, the order relationship of the compressed data should be consistent with that of the decompressed (original) data so as to ensure the result of the predicate evaluation correct. For example, assuming the predicate is ‘age<10’, the compression-encoded value ‘encoded(age)’ for ‘age’ which satisfies this predicate (for example, 1, 2, . . . , 9) should be smaller than the compression-encoded value ‘encoded(10)’ for the constant ‘10’ in this predicate. These days, some compression-encoding schemes capable of well keeping the order preserved for the data types such as integer, decimal, double, fixed length character string and so on have been proposed in the art. As for the variable length character string, however, there is no favorable solution yet because there are two comparison semantics for the variable length character string in the relational database: trailing blank insensitive semantic and trailing blank sensitive semantic.
When comparing the variable length character strings in the trailing blank insensitive semantic, the character strings of different length will be compared after being padded with blank characters at lower bits to the same length, while in the trailing blank sensitive semantic, the character strings of different length will be compared directly in accordance with an alphabetical order. For example, for the following five variable length character strings represented by hexadecimal ASCII codes {x′31′, x′31 18′, x′31 20′, x′31 20 20′, x′31 32′ }, in the trailing blank insensitive semantic, they will respectively correspond to {x′31 20 20′, x′31 18 20′, x′31 20 20′, x′31 20 20′, x′ 31 32 20′ } after being padded with blank characters at the lower bits (the ASCII code of the blank character is 20) and thus are ordered as x′31 18′<x′31′=x′31 20′=x′31 20 20′<x′31 32′, while in the trailing blank sensitive semantic, these five variable length character strings will be ordered as x′31′<x′31 18′<x′31 20′<x′31 20 20′<x′31 32′. It can be seen that the variable length character strings will be ordered differently in the trailing blank insensitive semantic and the trailing blank sensitive semantic, and it is difficult to find a suitable compression-encoding manner capable of keeping the order preserved for both semantics.
An generally used existing solution for such a situation is employing a compression-encoding manner in which the order of the compression-encoded data is consistent with the order in the trailing blank sensitive semantic when performing compression-encoding on the original data. In this solution, the predicate can be evaluated directly on the compression-encoded data when performing predicate evaluation in accordance with the trailing blank sensitive semantic. However, if it is desired to perform the predicate evaluation in accordance with the trailing blank insensitive semantic, the compression-encoded data must be decompressed and then the predicate evaluation is performed on the decompressed data. It is obvious that such a solution does not support evaluating the predicate on the compression-encoded data in accordance with the trailing blank insensitive semantic and thus fails to take full advantages of the superior performance of performing the predicate evaluation directly on the compression-encoded data.