The present invention relates to database operations and more particularly to an apparatus and method for providing access to prior rows of data stored in a table.
Database systems typically allow users to store, update, and retrieve information by submitting commands to a database server. To be correctly processed, the commands must comply with the database language that is supported by the database server. One popular database language is known as the Structured Query Language (SQL). SQL provides a user with the ability to create tables that can be used to store various types of information. The tables are generally organized in the form of rows and columns. SQL also provides a user with the ability to generate complex queries that can be used to retrieve specific information from the tables. The information retrieved may be further organized into meaningful form for subsequent presentation and/or analysis.
Queries are often created by users through a client station and then submitted to a database server. The database server processes each query and generates a query result that satisfies the criteria defined by the query. The query result is then transferred back to the client station from which the query originated.
One type of query that is often generated by a user involves the calculation of reporting functions that generate values for a current row based on values in previous rows. Reporting functions include cumulative sums, moving sums, moving averages, etc. In order to efficiently compute the required values, reporting functions require simultaneous access to multiple rows of a table. However, the standard SQL implementation does not provide an efficient method for accessing more than one row of a table at a time.
Consider a table having stored therein xe2x80x9csales dataxe2x80x9d that records the sales of a company. The sales data is entered for each day in a different row of the table. Assume that a user desires to know the cumulative sum to date for every day (or a predetermined number of days) in the table. One way to calculate the cumulative sum for a particular day is to add the sales data of that particular day to the cumulative sum that was calculated for the previous day.
Unfortunately, SQL does not provide an efficient mechanism to perform this type of operation, since it requires access to a value calculated for a prior row. Rather, SQL only allows access to columns in the current row. Therefore, it is difficult to efficiently perform functions such as cumulative sums, moving averages, etc.
A typical approach for overcoming the problem of accessing data stored in previous rows of a table is to perform a self-join of the table. For example, consider a table T having R rows. If an operation or computation requires simultaneous access to values stored in previous adjacent rows, such as rows one through four (1xcx9c4), then four self-join operations must be performed before the user is able to execute the query that will result in, for example, a cumulative sum. This approach becomes increasingly inefficient as the size of the table increases, because the join operations become increasingly time consuming. Additional inefficiencies arise based on the number of desired rows that must be accessed at one time. In other words, an operation that uses data from n adjacent rows requires n copies of the table and nxe2x88x921 self-join operations.
Another approach for overcoming the aforementioned deficiency is to program the required logic for accessing multiple rows at the application level. However, this approach may not be efficient, as it requires bringing rows from the database to the application. Also, this approach does not provide language primitives to operate on previous rows and thus decreases the generality of SQL-based solutions.
Based on the foregoing, a primary disadvantage associated with current methods of executing queries that must operate on series of ordered rows such as, for example, reporting functions is the number of operations that must be performed on the table before the reporting function can be computed.
Accordingly, there is a need for methods and apparatus for reducing the number of join operations necessary in order to calculate values derived from multiple rows of a table.
These and other needs are addressed by the present invention, wherein a database server accesses one or more previous rows of data (relative to a current row of data). When calculating a desired time series representation, the techniques described herein eliminate the need to perform multiple self-joins.
In accordance with one aspect of the present invention, a server receives a statement that requires generation of one or more rows of data. The server delivers the one or more rows of data to the client station in a particular generation sequence. The row currently being generated is designated to be the xe2x80x9ccurrent row of dataxe2x80x9d. Rows that precede the current row of data in the generation sequence are xe2x80x9cpriorxe2x80x9d or xe2x80x9cpreviousxe2x80x9d rows relative to the current row. As the current row of data is being generated, the server accesses a previously generated row of data. Thus, a current row can contain a function of previous rows and the current row itself.
According to one embodiment of the present invention, the server stores each previously generated row into a buffer. When access to a previously generated row is required, the server finds the row in that buffer.
In accordance with another aspect of the present invention, the statement may include various parameters that specify particular information regarding the previously generated row. One such parameter is an offset parameter that specifies a location, in the generation sequence, of the previously generated row relative to the current row. According to a specific embodiment of the invention, the statement also includes a default parameter that specifies a default value which will be returned if the offset parameter is determined to be outside a predetermined range of addressable rows in the generation sequence. The statement will also include a column parameter that specifies the column from which values will be retrieved. Another aspect of the invention is the ordering of the generation sequence.
According to another aspect of the present invention, a computer readable medium is specifically configured to carry one or more sequences of instructions for performing the techniques described herein. In addition, a system is provided for implementing the techniques.
The present invention advantageously provides an ability to simultaneously access multiple rows of data from one table. This ability eliminates the number of operations, such as self-joins, that are required in order to create a table that includes current column values derived from rows that have been previously processed. This is achieved by utilizing the offset included in the received statement to indicate where the previous row is located, without changing the value or designation of the current row. The present invention may also be configured to efficiently manage dynamic memory in the server by removing unnecessary rows from the buffer contained in dynamic memory, or storing only those rows that are needed.