Standard audio compact discs (“CD”) may not, and normally do not, contain any information related to the content such as artist, track, and title. The only information that is guaranteed to appear on any standard CD is the table of contents (“TOC”) which is a “header” at the beginning of each disc. The TOC marks the beginning of each track in frames which are 1/75th of a second. As such, the CD player can use this information to precisely locate the beginning of each track and to determine the precise track length. For illustrative purposes, a four track CD may contain a TOC composed as follows: [150 26570 49757 72545 94105]. “Track 1” begins at frame 150 (e.g. 2 seconds) and ends at frame 26570 (e.g. approximately 354.25 seconds). “Track 2” begins at frame 26570 and ends at 49757. The last frame, 94105, corresponds to the end of “Track 4” and the end of the CD program area.
Due to the precise nature of TOC frames, the likelihood of two CDs sharing the same TOC is extremely low. As such, a TOC can normally be used to uniquely identify the current CD being played. There are two methods of performing this comparison: exact matching and fuzzy matching. Exact matching requires that all frames from the inputted TOC match the frames of a reference TOC in a database. Fuzzy matching compares the inputted TOC to a subset of reference TOCs in a database and, using an algorithm, determines a correct, or closest, match. Fuzzy matching is particularly useful when exact matches cannot be found. For example, when an album has been reprinted, the TOC of this reproduced album often does not precisely match the TOC of a previously printed album.
Currently, the most common implementation of TOC lookups uses a general-purpose database engine. In many cases, high-end devices utilize a standard B-tree database. This type of database is able to meet the needs of TOC lookups (including fuzzy matches) with the principal advantage being that a general purpose engine can be dynamically updated. However, there are many disadvantages of using such a database structure due to the fixed overhead with regard to code size and performance.
As a consequence of the general-purpose indexing mechanism, a general-purpose database normally requires several disk seeks for each TOC lookup (up to thousands in some cases of fuzzy matching). This is because of the non-linear organization of database information (e.g. TOCs). A standard database normally contains separate “buckets” of information. Both exact and fuzzy matching require sifting through one of more of these buckets, and accessing each bucket requires a database access. Each database access may require a number of disk seeks and significant CPU time to traverse the index. For example, for each bucket the system must navigate through a complex indexing system to locate the address of the bucket, seek to the bucket, and finally scan through the bucket. To search a second bucket, the system must perform the same operation. This can require a substantial amount of seeks which necessitate the use of high-end hardware. On a low-end platform a fuzzy matching operation with a general-purpose indexing scheme could take up to several minutes. As such, these common databases require fast hard disk speeds, extra RAM for caching of data, and significant amounts of CPU processing time.
While this may be acceptable for a high-end hardware platform, implementing such a system and method in a low-end hardware platform would result in extremely poor performance due to limited resources (e.g. RAM and storage space) and low processing power. In many instances, the poor performance renders it unusable. Further, dynamic updates may not be required as part of a low-end solution, which suggests that the general-purpose database engine need not be utilized for TOC lookups in such cases.
In traditional indices, 20-40% of the space consumed by a B-tree index is devoted to the indexing information with the remainder being used to store the actual data itself. The overhead of a B-tree index is variable, and increases as more records are added to the index. Therefore, a variable and substantial portion of storage space is “wasted” on the indexing information rather than on the actual data. In some cases the wasted space can cost many megabytes.