When working with large volumes of data, there are often situations when a portion of the data is relevant for a given task. In those situations, one or more queries are run on a database server to generate the relevant result set from data storage devices. Many diverse retrieval and data storage systems are available today from many different providers. However, these retrieval and data storage systems do not support persistent result sets or handle large result sets efficiently that are more than a few hundred megabytes because it is streamed from the data server to a client. Further, these retrieval and data storage systems cannot disconnect from the database server until after a client has consumed the result set at the end of the query, otherwise there is no way to get back to or obtain the result set.
While some retrieval and data storage systems attempt to handle a persistent or large result set, there are drawbacks with these systems. For example, the query and retrieval of the result set is often slow because the query is sent to, and entire result set is sent back from, the database server in one computer network pipe. In addition, existing retrieval and data storage systems produce the result set sequentially.
Another potential drawback of existing retrieval and data storage systems include the fact that a database server has limited resources. Such server resources cannot be freed up to run new queries until the result set is fully consumed by the client, such that the process is at least partially dependent upon how fast the client can consume the result set. Thus, because query execution on the database server side is tightly coupled with the consumption of the result on the client side, queries can only be run as fast as a client can consume result sets. This process may be further slowed when queries are run over a network. Further, existing retrieval and data storage systems require a significant amount of resources that must be held on the database server to support delivery of a large result set, until delivery of the result set to the client is complete. Since this process may be very slow, it is inefficient to produce and consume such large result sets. Thus, existing database servers and systems are significantly limited.
Additionally, existing retrieval and data storage systems do not allow for fast scrolling of a large result set (e.g., via a cursor mechanism). Accordingly, it is expensive for a client because there is no skipping over data that a user is not interested in. Thus, it is inefficient to produce and consume such large result sets.
Another potential drawback of existing retrieval and data storage systems include the fact that result consumption is synchronous. The user has to consume the result set from the same machine and client where a query is submitted or initiated. Additionally, the user has to consume the result set at the time when the query produces the result set. Otherwise, the result set will be lost and the query must be rerun. Thus, the user cannot go back to a result set after it has been produced.
Because result consumption is synchronous, other potential drawbacks of existing retrieval and data storage systems include the fact that query results cannot be audited and query results cannot be shared among different clients. Thus, there are significant drawbacks with existing retrieval and data storage systems.
The systems and methods described herein provide an improved approach to handling persistent or large result sets in retrieval and data storage systems, which alleviates one or more of the above-identified limitations of the existing systems.