The present invention relates to the field of computer systems, and in particular, to high-performance indexing for data-intensive systems.
Data-intensive systems, that is, computer systems that involve substantial amounts of data storage and recovery, are being employed in a wide variety of applications today. Efficient data storage and access normally uses an index structure, such as a key-value index where the address of storage is determined by applying a key (representative of the stored data) to the index to obtain the storage address. Key-value storage systems are employed in cloud-based applications as diverse as ecommerce and business analytics systems and picture stores. Large object stores having key-value indexes are used in a variety of content-based systems such as network de-duplication engines, storage de-duplication, logging systems and content similarity detection engines.
An index may be a simple association list linking pairs of keys and address values like the index of a book. Finding a particular index entry could conceivably be done by ordering the keys (like alphabetizing entries in an index) and searching for key using a search out of them such as a binary search. Preferably, however, to ensure high application performance, index systems often rely on random hashing-based indexes, whose specific design may depend on the particular system. Generally a hash includes keys and values at locations within the index may be determined by applying a hash type function to the key. A benefit of hash indexes is that the hash function immediately directs the user to the necessary key-value pair. For example, wide-area network (“WAN”) optimizers, Web caches and video caches may employ large streaming hash tables. De-duplication systems may employ bloom filters to summarize the underlying object stores. Content similarity engines and certain video proxies may employ locality sensitive hash (“LSH”) tables. Given the volume of the underlying data, the indexes typically span several tens of Gigabytes, and indexes continue to grow in size. The information in indexes of this type are held both in the key-value pairs of the index but also in the particular topology of the index, that is the location and not simply the order of the keyvalue pairs in the index. Compressing or reordering the entries in a hash type index, for example for space savings, would render the hash index inoperable.
Across such systems, the index may be quite intricate in design. Significant engineering is often devoted to ensure high index performance, particularly with respect to achieving low latency and high throughput, at low costs, particularly with respect to the value of each component used to store the index, as well as the amount of energy they consume. Many state-of-the-art systems advocate using solid-state drive (“SSD”) implementations comprised of flash memory to store indexes, given flash memory's superior density, lower cost and energy efficiency over conventional memory, such as DRAM, and superior density, energy efficiency and high random read performance over conventional disk storage. As used herein, SSD will be understood to be non-volatile solid-state memory commonly known as flash memory.
In SSD's, a flash memory page, which may be between 2048 and 4096 bits in size, is typically the smallest unit of read or write operations. Accordingly, reading a single entry in an index stored in the SSD, such as a 16 Byte key-value pair entry, may be as costly as reading a page. In addition, pages are typically organized into blocks with each block spanning 32 or 64 pages. While the performance of random page reads may be comparable to that of sequential page reads, random page writes are typically much slower.
Some ability to provide increased throughput in SSD implementations via leveraging certain parallelisms currently exists. Certain SSD implementations have begun to support native command queuing (“NCQ”), in which multiple I/O operations may execute concurrently.
Some recent research proposals have proposed SSD-based indexes for large key-value stores.
One such proposal, “Cheap and Large CAMs for High Performance Data-Intensive Networked Systems,” NSDI 2010, Ashok Anand, Chitra Muthukrishnan, Steven Kappes, Aditya Akella and Suman Nath, referred to as “BufferHash,” the contents of which are hereby incorporated by reference, buffers all insertions in the memory, and writes them in a batch on flash. BufferHash maintains in-memory bloom filters to avoid spurious lookups to any batch on flash, and requires less than one page read per lookup on average. However, BufferHash often scans multiple pages in the worst case due to false positives produced by the bloom filters and typically requires greater than 4 bytes/key.
Another proposal, “SILT: A Memory-Efficient, High-Performance Key-Value Store,” SOSP, pages 1-13, 2011, H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky, referred to as “SILT,” the contents of which are hereby incorporated by reference, comes close to meeting the design requirements outlined above by achieving a low memory footprint (0.7 bytes/entry) and requiring a single page lookup on average. However, SILT uses a much more complex design than other systems in that it employs a plurality of data structures where one is highly optimized for a low memory footprint and others are write-optimized but require more memory. SILT continuously moves data from the write-optimized data structures to the memory-efficient data structure. In doing so, SILT has to continuously sort new data written and merge it with old data, thereby increasing the computation overhead. These background operations also affect the performance of SILT under continuous inserts and lookups. For example, the lookup performance drops by 21% for a 50% lookup-50% insert workload on 64 B key-value pairs. The authors of SILT also acknowledge that sorting becomes performance bottleneck.
The conventional wisdom with respect to index design is that domain and operations-specific SSD optimizations are necessary to meet appropriate cost-performance trade-offs. This poses two problems: (a) SSD implementations having poor flexibility, and (b) SSD implementations having poor generality.
Poor Flexibility:
Index designs often target a specific point in the cost-performance spectrum, severely limiting the range of applications that can use them. This also makes indexes difficult to tune, for example, using extra memory for improved performance. In addition, indexes are often designed to work best under specific workloads. As a result, even minor deviations often cause performance to be quite variable.
Poor Generality:
The design patterns often employed typically apply only to the specific data structure on hand. As a result, it is often difficult to employ different indexes in tandem, such as hash tables for cache lookups alongside LSH tables for content similarity detection over the same underlying content, as they may employ conflicting techniques that result in poor SSD input/output (“I/O”) performance.