Data is generated and stored at ever increasing rates in organizations both governmental and corporate. While some business data almost never loses its value, the usefulness of most data tends to decline over time until it has no further value for almost any purpose. Generally, the value of business information tends to be the greatest soon after the data is created and remains active for only a short period of time, at most a few years, after which the data's importance to the business, and thus the data's general usage, begins to decline. However, many businesses are constrained by factors, such as legal mandates, requiring data to be retained long after the data's usefulness has expired.
As a result, it has been commonly observed that users access at times as little as 10%-20% of the data stored within a database. The other 80-90% of data is rarely, if ever, accessed by users and yet contributes to the bulk of the storage costs required to maintain the database. To make matters worse, as the volume of data stored by the database increases, performance becomes degraded due to slower full table scans and elongated application upgrades.
Thus, faced with rising storage costs and deteriorating system performance, businesses have sought ways to efficiently manage their database's inactive data. At present, many businesses try to achieve this goal by resorting to third party archiving solutions that offload inactive data out of the database and into archival storage. While these solutions help to ameliorate the effects of accelerated data production, businesses employing third party archiving solutions tend to replace one problem with a host of others.
As one issue, third party archiving solutions tend to lack intimate knowledge of the format and contents of the database. For instance, many third party archiving solutions only work at the level of the file system and therefore can only archive at the granularity of a file. Thus, while a third party archiving solution may be able to offload the individual files that make up the database, the third party archiving solution may not be able to selectively offload individual database elements.
Furthermore, the third party archiving solution may detect inactive data based on file level metrics, such as the frequency or recency with which files are accessed, but may be unable to detect inactivity at the granularity of an individual database element. Consequently, when a database file contains both active and inactive data, third party archiving solutions may be unable to identify and archive only the inactive data. This issue can be exacerbated by database implementations that store data as flat files, where an entire table, or even the entire database, may be contained within a single large file on the file system.
To work around the inability of third party archiving solutions to separate inactive data beyond the granularity of a file, a database administrator may instead manually separate active and inactive data objects into different database files. For example, the database administrator may explicitly move inactive data objects to a separate tablespace, which can then be offloaded to archival storage. However, this workaround introduces heavy database administration burdens, as implementing and validating the required application schema modifications is not a trivial task. For example, the database administrator may be required to discern data access patterns from redo logs and develop customized scripts to separate the inactive data.
Based on the foregoing, there is a need for a method of identifying database activity at a fine granularity level while maintaining database manageability and performance.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.