Many businesses and organizations use computing systems comprising servers that serve numerous client devices and services. Organizations that operate the computing systems typically collect and analyze various statistics related to resource performance and utilization. Examples of such statistics include disk space usage and processor utilization.
Typically, server performance data is compiled on a periodic basis such as once per day or week. Such data is then compiled and analyzed, and the daily statistics are stored and kept available for a period of time depending on the organization's policies. In many cases, a one-year rolling history of high-volume performance data is desired. Typically, each server provides large quantities of performance data on a daily basis. For a large organization with hundreds or thousands of servers, the compilation and computation of performance data can be computationally intensive. For example, in a typical organization, the window of time during when new daily data arrives and daily reports are due is typically short and requires a significant amount of computation during the window.
The reporting and analysis of this history data can comprise either selecting a particular data elements or range of elements, or reading the entire file by order of element. The raw input data received from servers is typically in unsorted form. When the data is sorted, data objects that are related or fall under the same category must be retrieved. However, the related data objects are typically stored as the objects are received, and the related objects are generally not in contiguous storage locations. Thus the disk drive readers must scan large storage areas to retrieve the various related data objects, further adding to the time required to compute the performance data.
Many companies desire to perform data retrieval and sorting on a regular basis. In a typical scenario, server performance programs must analyze and report data representing one year for performance data for each server. Furthermore, the company IT department's capacity planners must be able to quickly select data for a particular server. Thus, in a typical large company that may have as many as 2000 servers, there may not be sufficient time or computational resources to carry out this task in a timely manner.
A typical approach to this problem is to store the data in a file or database. Each day, new data is inserted into or appended to the end of the table. An index by object is maintained to allow a program to access or select the data by object. While this method may produce the desired result, read access is typically slow if one uses an index to read all objects of a very large table because the data for each object is widely scattered across the database. This method also requires a periodic purge to limit the data to the current rolling year. The result is that while write speeds are fast, read speeds are slow, and there must be an extra purge step. A variation to this approach is to keep each day's data physically separate. This removes the need for an extra purge step, but this variation makes reading the data even slower and requires more complex code to do so.
Another approach is to sort the new and old data together by object each day. This solution also requires that each index be rebuilt each day. As files grow, however, the time it takes to sort the data for each day's updates will grow and become unwieldy. While this method does have the advantage of not requiring a separate purge step and takes advantage of fast read speeds, the result is very slow. In many cases, very rapid update times are critical because users often want to use new data to obtain fresh analysis as soon as possible after the data is available.
The problems noted above are experienced in various applications beyond server performance data. Similar problems exist for situations where efficient storing and retrieval of data is desired, wherein the data is (1) organized as a collection of items, (2) periodic updates are provided for each item, (3) access to the history for at least one item is desired, and (4) the amount of data is of such a size that rapid updating and retrieval is important.
What are needed are systems and methods that address the shortcomings described above.