1. Field of the Invention
The present invention is directed to the field of data storage. In particular, the present invention is directed to collection of garbage data on a data storage system.
2. Description of Related Art
The quantity of fixed data content, such as text files and image files, is rising rapidly. For example, the Internet Archive WayBack Machine (www.archive.org) currently archives 30 billion web pages.
Further, compliance with stricter government regulations is requiring the storage of large amounts of selected data, such as securities and medical data, together with procedures for timely and verifiable retrieval of this data from the data storage system.
Due to rapidly increasing processor performance and disk storage size, data is increasingly stored on computer-based data storage systems, and, particularly, disk drives. However, while the storage capacity on disk drives has progressed rapidly, the ability to locate, access, and retrieve selected data has not progressed at the same rate. In particular, once selected data is located in a data storage system, the retrieval of the data is still limited by the performance of the disk head to write or read the data to or from the disk, as well as the bandwidth of the communication channels used to transfer the data into or out of the data storage system.
Prior art data storage systems primarily based the reliability of the data storage system on the hardware utilized by the system. Thus, many prior art storage systems often used highly configured data storage systems with costly hardware and inflexible architectures to attempt to manage the storage and retrieval of data in large data storage systems. If a component failed, a system administrator was often immediately notified to repair or replace the component to prevent failure of the system. Consequently, one or more system administrators were sometimes needed to maintain the hardware, and thus the reliability of the data storage system.
Additionally, most prior art data storage systems permitted modification of data stored on the data storage system. Thus, to maintain coherent data, these prior art data storage systems often utilized lock managers that prevented concurrent modification of stored data. Disadvantageously, the lock managers often became a bottleneck in the data storage system.
Further, if a user desired to execute an application using data stored on a prior art data storage system, the data had to be located on the data storage system, transferred from the data storage system to the user's system, and then the application could be executed using the transferred data on the user's system. When large amounts of data were requested, data transfer was often a lengthy process due to bandwidth limitations of the communications channels used to transfer the data. Additionally, once the user received the data, the user was limited to the processing capabilities of their computer system.