The development of modern computing technology has led to vast amounts of stored data. In some examples, modern systems require maintaining hundreds of millions, or even more, data values. Retrieval of the desired data may require searching for a desired key among the data and may require a large volume of operations to determine the associated value. Typical operations include searching among the data values and determining associations between keys and related values. In this regard, a key-value pair may include a key by which a requesting service or method may request an associated value. Depending on the data structure, the data is searched and processed such that an associated value is returned to the requesting service or method.
Java Multimap is an example data structure used to maintain large amounts of key-value pairs. The Multimap maps each key ki from the set (k1, . . . , kn) to a sequence of values [v1i, . . . , vmi] of various size. However, the Java Multimap, as well as many other data structures may experience decreased performance as the size or amount of values stored in the data structure increase. Searching, as well as adding and removing key-value pairs may have a detrimental effect on performance, particularly as the amount of data increases.
FIG. 1 is a plot illustrating insertion times of various data structures including JDK (Java Development Kit) 1.7 HashMap, Guava 18.0 Multimap, Apache Commons 4.0 Multimap, Trove 3.0.3, CERN (European Organization for Nuclear Research) Colt 1.2.0, HighScale Lib 1.1.2, HPPC (High Performance Primitive Collections for Java) 0.6.0, Javolution 6.1.0 and Primitive Collections 1.2 for Java or PCJ (Pluggable Java Collections). Results are shown for 1 gigabyte of free memory provided to each data structure. Computations performed to gather the data were performed on Intel Pentium i7 950 3.0 GigaHertz (GHz) (4Core) having 6 gigabytes of random access memory (RAM). As illustrated in the plot, insertion times and performance are impacted significantly when the data structures store large amounts of key-value pairs. Particularly in these examples, many implementations experience significant performance degradation as the number of key-value pairs approaches and/or exceeds 5 million key-value pairs per gigabyte. Many modern computing systems require an even higher volume of data to be processed with less memory.
Furthermore, many implementations of data structures are based on mappings of keys to dynamic arrays of values, requiring dynamic allocations of objects during the addition and removal of elements. The final sizes of sets of values associated with keys in such data structures may be unknown at the time of creation or instantiation, which may cause further performance degradation with respect to reallocations, particularly when more values are added.
In Java, allocations in MultiMap are managed by a Garbage Collector (GC) and when the amount of allocations becomes large, the GC must track many objects. For example, each time a new object is added or removed, the GC may instantiate and/or track numerous objects, resulting in the performance degradation. Furthermore, memory fragmentation may also cause reallocations of objects within the heap, which may add even more time and usage of computing resources for the completion of operations.