1. Field of the Invention
This invention relates generally to data access and, more particularly, to a sortable hash table.
2. Description of the Related Art
In an increasingly competitive world, enterprises are constantly in need of business intelligence that empowers the decision makers in the organization to act on the information, and thus impart extra competitive edge to the organization's products and services. Businesses succeed or fail based on their ability to accurately quantify how many leads become orders, identify their most profitable customers, forecast manufacturing capabilities, manage reliable supply chains, and create sales projections, for example.
However, obtaining information on which decision makers can act presents several practical challenges. One such challenge is the massive amount of data available to the enterprise in today's Information Age. Conversion of data to information which can be readily understood is an obstacle. Additionally, enterprises today have data spread over multiple data sources ranging from legacy systems to relational databases and text files. Even if these problems are surmounted, publishing information in a secure and reliable manner remains another concern for enterprises.
Reporting systems with data visualization functionalities can provide users with the capability to convert diverse data into information that can be easily visualized and deciphered to exploit the information and learn more about the business. Visualization components can emphasize high-level patterns and trends in large and complex datasets.
For many applications, including data visualizations, it is useful to be able to provide access to data through names, categories, strings, or other symbolic and symbolic properties, while also allowing the data to be ordered by some arbitrary function of the data. For example, symbolic access to data can be useful when building a table of data associated with categories, such as a histogram. A histogram is a distributive representation of attributes of data records. In other words, a histogram is a data visualization that tallies the frequency of occurrence of symbols in an input sequence, such as names in a log file. One reason to build such a histogram is to identify the symbols that occur with the highest frequency in the input sequence, such as a familiar top-ten list. To build a histogram, it is desirable to provide access to these symbols in the input data using symbolic properties. Symbolic property access can be accomplished using a hash table.
Thus, histograms are typically created by using a data structure such as a hash table or an associative array, both of which associate a symbol (or string or name or other symbolic property) with a value. Since histograms are a tally of frequency of symbols, in order to identify the symbols (i.e., symbolic property) that occur with the highest frequency in an input sequence, the elements of the hash table need to be sorted by the value (e.g., the frequency). However, hash tables and other data structures that associate symbolic properties with values are designed to support symbolic access. Thus, although hash tables allow fast data access through symbolic properties, hash table entries are typically not re-orderable or sortable. Other data structures, such as arrays, support sorting but do not support symbolic property access.
The typical solution for providing sorting in a hash table, to be used for example for generating a sorted histogram, includes: first, counting the frequency of each symbol (i.e., symbolic property), second, copying unique (hashed) symbols and frequencies to a sortable array, and third, sorting the array.
One disadvantage of the prior solution is that it requires an explicit copy of the hash table contents, occupying more space in memory. A further disadvantage is that by separating the symbolic data (i.e., the hash table) from the ordered data (i.e., the sortable array) the process does not easily support the incremental addition or deletion of data values. Moreover, prior solutions do not provide for access by both symbolic name and access by numeric value using a single methodology.
The Java™ TreeMap class provides a red-black tree implementation of a SortedMap interface. Elements in the Map are be sorted by either a user-provided Comparator object, or by the natural ordering of the keys. Since ordering is determined by the Comparator when the treemap is created, any new object that is added is positioned according to the ordering set out by the Comparator. This is an inflexible solution which does not permit sorting in alternative ways and/or sorting according to the other properties of the object. Moreover, using the Java™ TreeMap class, new objects are sorted as they are added. This requires that the values of the new objects to be known before they are inserted. Such a requirement is problematic, for example, when property values for objects are combined for generating data visualizations such as histograms.