(Not Applicable)
(Not Applicable)
1. Technical Field
The inventive arrangements relate generally to recording systems and more particularly to multimedia recording systems that record digitally encoded signals onto disc media such as hard drives and recordable optical discs.
2. Description of Related Art
Currently, many forms of data can be recorded onto many different types of storage media. As an example, many consumers record television programs or music onto an optical disc medium or a hard disc drive (HDD). As technology has improved, the storage capacity of optical disc media and HDD has significantly increased. In fact, some HDDs can store well over 50 gigabytes of data. As such, a consumer can record a large number of programs or songs on this type of storage medium.
When data is recorded onto a recordable storage medium, the recordable storage medium device typically permits the user to enter a title for purposes of identifying the recorded work. These titles may be useful when the user wishes to locate a particular piece of recorded data to determine whether the user has previously recorded such data. Significantly, however, this process of searching may be laborious, inefficient and prone to errors, as the storage medium may contain hundreds or even thousands of titles. This problem may be particularly acute if the storage medium is a large HDD or if the titles of certain data segments were given default titles.
Even assuming a data segment on a storage medium could be located relatively easily by searching for the title, a particular title may be the same for different data segments. For example, if a song is recorded on a storage medium and is given a title based on the name of the song, a second song can be recorded later that has a name that is identical to the name of the first song. This confusion may occur, for example, if two separate artists record different versions of the same song. When recording the second song, the user may check the titles of the songs previously recorded and may mistakenly assume that the second song has already been recorded. Thus, a need exists for a system and method for searching for duplicate data without increasing system costs or complexity and further reducing the possibility for errors when searching and considering deletion of duplicate data.
The present invention concerns a method of searching for duplicate data. The method includes the steps of: generating at least one identifier from at least one portion of a first segment of data using a unique identifier function; generating at least one identifier from at least one corresponding portion of a second segment of data using the unique identifier function; and comparing the at least one identifier associated with the first segment of data with the at least one identifier associated with the second segment of data to determine whether the first segment of data is substantially identical to the second segment of data.
In one arrangement, the step of generating at least one identifier from at least one portion of a first segment of data can include the step of generating at least one identifier from the at least one portion of the first segment data using a unique identifier function as the first segment of data is recorded onto a storage medium or after the first segment of data is recorded onto the storage medium. In addition, the step of generating at least one identifier from at least one portion of a second segment of data can include the step of generating the at least one identifier from the at least one portion of a second segment of data using the unique identifier function as the second segment of data is recorded onto the storage medium. Moreover, the step of generating at least one identifier from at least one portion of a second segment of data can occur as the second segment of data is recorded onto a different storage medium.
In one aspect, the first segment of data and the second segment of data can be segments of multimedia data. The method can also include the steps of: storing in a table the at least one identifier associated with the first segment of data; and retrieving from the table the at least one identifier associated with the first segment of data prior to the comparing step. In addition, the method can include the step of presenting an indication that the first segment of data is substantially identical to the second segment of data when at least one identifier associated with the first segment of data matches the at least one identifier associated with the second segment of data.
In another arrangement, the size of the at least one portion of the first segment of data and the at least one portion of the second segment of data can be based on a temporal measurement or a bit measurement. The at least one portion of the first segment of data can correspond temporally or correspond bit by bit with the at least one portion of the second segment of data. In another aspect, the at least one identifier associated with the first segment of data and the at least one identifier associated with the second segment of data can be hash values, and the unique identifier function can be a hash function in which the hash value associated with the first segment of data will equal a hash value associated with the second segment of data when the first segment of data and the second segment of data are identical.
Also, the comparing step can include the step of comparing a plurality of identifiers associated with the first segment of data with a plurality of identifiers associated with the second segment of data to determine whether the first segment of data is substantially identical to the second segment of data. Further, the comparing step can include the step of comparing a plurality of identifiers associated with a first set of segments of data with a plurality of identifiers associated with a second set of segments of data to determine whether the first set of segments of data is substantially identical to the second set of segments of data.
The present invention also concerns a system for searching for duplicate data. The system includes: a controller for reading data from and writing data to a storage medium; and a processor, wherein the processor is programmed to: generate at least one identifier from at least one portion of a first segment of data using a unique identifier function; generate at least one identifier from at least one corresponding portion of a second segment of data using the unique identifier function; and compare the at least one identifier associated with the first segment of data with the at least one identifier associated with the second segment of data to determine whether the first segment of data is substantially identical to the second segment of data. The system also includes suitable software and circuitry to implement the methods as described above.