The present invention is related to data storage systems. More particularly, the present invention is directed to a method and system of reclaiming storage space in data storage systems.
Data storage systems utilize various mechanisms to identify available storage space. One approach is to maintain a “high water mark” to indicate a boundary between the portion of a data container that is unavailable, i.e., has been allocated for storage, and the portion of the data container that is available, i.e., has not been allocated for storage. The “high water mark” may be a pointer to the most recently allocated space in the data container. This mechanism optimizes the efficiency of allocating space by making it easy to locate available storage space.
The “high water mark” mechanism, however, may result in less efficient use of storage space in data storage systems because data objects that are initially very large frequently shrink in size. For example, a table that starts out with 1 million rows may end up with only a few hundred rows after several transactions. As a result, much of the storage space allocated for the table will be left unused. Since the “high water mark” only moves in one direction, the newly freed storage space below the “high water mark” will not be available to store other data objects as the system assumes that available storage space can only be found above the “high water mark.”
Having data containers with unused storage space scattered throughout can impact the performance of scans and DML (Data Manipulation Language) operations. Scans may be affected because the amount of storage space read may not be proportional to the data retrieved. In addition, the length of time it takes for a scan to complete may also impact various operations that are scan-based.
In OLTP (Online Transaction Processing) systems, large tables called staging tables are often used for staging data. For example, data may be inserted into the staging tables for pickling. Pickling is a process of transforming data from a source representation to a uniform target representation. A large amount of the data in the staging tables is frequently deleted after pickling. The space that has been allocated for the staging tables, however, may remain unavailable to other objects for a considerable period of time after the data has been deleted.
One method of making available, i.e., reclaiming, unused storage space in an existing data container that is below the “high water mark” is to create a new data container, allocate storage space for objects in the existing data container in the new data container, move those objects to the new data container, and delete the existing data container. This solution, however, requires extra storage space. Hence, additional equipment, e.g., data storage devices, disk drives, etc., may need to be purchased. In addition, the objects may be offline; i.e., inaccessible, during the reclamation process, which may not be acceptable to end-users. Furthermore, dependent objects may have to be recreated as a result of the reclamation process.
Thus, it is desirable to provide a method and system where unused storage space in data containers can be reclaimed in place, i.e., without requiring extra storage space, where concurrency is preserved (i.e., objects in the data containers remain accessible during the reclamation process), and where data dependencies are maintained.
The present invention provides a method and system of reclaiming storage space in data storage systems. In one embodiment, data in a data container is compacted. A high water mark of the data container is then adjusted and unused space in the data container is reclaimed.
Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the invention.