The Instant Messaging (IM) tool has a low technical threshold for users, and the users can express instantly their feelings by fewer sentences and share information with other users and friends. So the efficiency of information transmission by IM is relatively higher, and its application has become increasingly widespread.
In theory, the contact or friend list of an existing IM user has no size limit. Using various client side software, the user can in real-time receive published or broadcasted messages from members on the contact list. Therefore, the amount of data corresponding to the received messages is very large. How to effectively store the received message data and quickly query the stored data is among the technical problems to be solved for the development of the IM technology.
In existing technologies, an IM client often uses hash algorithms and uses hash tables to achieve fast data storage. In general, hash algorithms include: Cyclic Redundancy Check 32 (CRC32), Message-Digest Algorithm 5 (MD5), and Secure Hash Algorithm (SHA). Hash algorithms can be used to convert a data string into a number to generate a key value, and the key value can be used in a modulo operation by a large prime number M, so the data can be evenly distributed over the hash table with a size of M.
FIG. 1 shows an existing hash algorithm-based data storage method. As shown in FIG. 1, using a hash algorithm, e.g., MD5, to numberize the data and to generate a key value as the first 8 bytes of numberized data, and then the key value is used to perform a modulo operation by a predetermined large prime number M. Further, the data and its corresponding key value are stored in the hash table mapped by the remainder of the modulo operation. The prime number M can be determined according to actual needs and the key value obtained by hashing the data.
When hashing different data generates the same key value, a key collision occurs, and all the data with the same key value is organized by a linked list in the hash table. As shown in FIG. 1, for data 100˜data 102, their key values modulo M is the same 10; for data 110˜data 113, their key values modulo M is the same 11, and these data items are stored by a respective linked list.
When a user needs to query the stored data, the data to be queried is first processed by the hash algorithm to generate a key value, and then the position of the data stored in the hash table can be determined using the modulo by the prime number M. Afterwards, the linked list at that position is searched. If data stored in the linked list matches the data to be queried, that data is obtained as corresponding to the key value, and the relevant information is returned to the user.
Therefore, because all data are stored in a single hash table, the required storage space is large and the requirement on storage equipment is high. At the same time, for data having a colliding key value when being stored, the hash table stores these data in the form of a linked list, each data item having the same key value, which causes difficulties for subsequent query and read operations. On the other hand, for data having no colliding key value, storage spaces for colliding key handling still need to be reserved in the hash table, and the prime number M based hash table needs to be traversed to ensure no same data existed before the data is stored, reducing the storage efficiency. Further, for non-colliding data stored in the hash table, the complexity of the query (efficiency) is related to the prime number M used in the hash table; while for the colliding data stored in the hash table, the complexity of the query is O(n), where n is the number of data items stored in the linked list. For example, to query data 111 in FIG. 1, the key value of data111 is first obtained, the Key%M (i.e., K modulo by M) is then calculated to obtain the remainder position, i.e., the index position as shown in location 11 in FIG. 1. Further, the linked list on the index position is traversed, until the data to be queried matches data111 stored in the hash table. The stored data 111 is then returned to the user. Thus, the speed of such data access is low, the complexity of the data access is high, and the query efficiency is low.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems.