1. Technical Field
The present invention relates to key-value storage, indexing, and more particularly to tiered key-value storage and indexing machines comprising a cluster of databases, file systems, or any other storage-stack software.
2. Discussion of Related Art
The volume of index data being generated by network-connected devices is outpacing data storage technologies' speed, capacity, or abilities. Examples of these devices include systems for automatically generating tags, indexing constantly captured video, social-networking services indexing a growing database, and systems that generate large volumes of index data.
Applications that create index data include data-deduplication and provenance systems. Data deduplication is one technology used to compensate for these large databases, where redundant data may be eliminated. Data deduplication relies on indexing to maintain performance levels. Automated provenance collection and indexing are examples of additional growing applications. Automatic provenance collection describes systems that observe processes and data transformations inferring, collecting, and maintaining provenance about them.
Individual machines that form a larger database cluster such as those used by Google's BigTable and Yahoo's Hadoop and HBase perform indexing tasks as well. These machines are referred to as ‘Tablet Servers’ in the literature. Even database engines such as MySQL's InnoDB, ISAM (Indexed Sequential Access Method), Berkeley DB, and other such key-value stores must perform indexing for traditional RDBMS (relational database management system) workloads. Indexing is being applied to system logs, file metadata, databases, database clusters, media tagging, and more.
In these contexts, and others, indexing is an important component of a variety of platforms and applications.