Networks and distributed storage allow data and storage space to be shared between devices located anywhere a connection is available. These implementations may range from a single machine offering a shared drive over a home network to an enterprise-class cloud storage array with multiple copies of data distributed throughout the world. Larger implementations may incorporate Network Attached Storage (NAS) devices, Storage Area Network (SAN) devices, and other configurations of storage elements and controllers in order to provide data and manage its flow. Improvements in distributed storage have given rise to a cycle where applications demand increasing amounts of data delivered with reduced latency, greater reliability, and greater throughput. Building out the storage architecture to meet these expectations enables the next generation of applications, which is expected to bring even greater demand.
One method of managing these large amounts of data is to host a relational database on a central system or system cluster and allow clients to interact with database. In this way, the clients can read, write, and manipulate the data remotely. This conventional server/client architecture has been proven effective and reliable. However while relational databases can scale and have been successfully deployed in the petabyte (1015 bytes) range, they do not scale easily and require a considerable investment in hardware and infrastructure. In particular, the overhead required to query and update databases of this size is non-trivial, and the burden of database management falls almost entirely on the serving system or cluster.
An example of a system that manages very large data sets is a customer support system. For instance, AutoSupport™ (“ASUP”) is the “call home” technology available to NetApp, Inc. customers that subscribe to NetApp's AutoSupport™ service. ASUP enables products to automatically send configuration, log, and performance data through SMTP, HTTP or HTTPS protocols to backend data centers. Technicians can use the ASUP data reactively to speed the diagnosis and resolution of customer issues and proactively to detect and avoid potential issues. Some implementation utilize a Network File System (NFS) source, a Relational Database Management System (RDBMS) target, and decoupled Java and perl processes to process the data. However, the conventional implementation using NFS, RDBMS, and Java and perl is not easily scalable to support future products or to accommodate increasing data load.
In this example and others, a need exists for improvements to the conventional relational database paradigm. Improvements that increase access speed, reduce network transactions, shift a portion of the data management burden to the requesting device, and reduce database complexity have the potential to pay dividends in both the short and long term.