In many cases it is desired to identify duplicate records. For example, an incoming stream of records to be stored in a database may contain duplicates. Duplicate records are generally desired to be identified such that the duplicates may be treated in a different manner (e.g. by either ignoring them or taking the duplication into account). Unfortunately, conventional techniques for identifying duplicate records have exhibited various limitations.
For example, a standard check for duplication is to try an INSERT operation into a database table that keeps a key of the fields unique per entry, and check the SQL result. However, this operation requires numerous input/output operations, and slows the system performing such operations considerably. Moreover, the INSERT operation is typically done as part of a roundtrip of the event in which the records are stored in the database, before a response is sent, and thus adds to the latency of the event (e.g. which causes an issue in real-time considerations).
In another example involving a real time environment, it is crucial to keep data available in a computer memory, as long as a computer program is running, while making sure that interrupted execution of a computer program does not break data consistency. Many existing solutions handle this issue by keeping a recovery log, including a list of past actions executed prior to stabilizing the data structure. Unfortunately, this takes overhead in terms of both time and space.
Moreover, in computer handling there are a huge amount of transactions, each involving a record, where large amounts of memory are used to store data such as the key of the record (for various reasons, including duplicate checking) or/and timestamp. There is a need to utilize the computer memory in an efficient and economical manner.
There is thus a need for addressing these and/or other issues associated with the prior art.