1. Field
The present invention relates generally to the management of data and, more particularly, to managing data in a network data processing system. Still more particularly, the present disclosure relates to a method and apparatus for data de-duplication in a network data processing system.
2. Description of the Related Art
Network data processing systems provide access to information and applications for users. Network data processing systems come in a number of different forms. For example, a network data processing system may include a local area network, a wide area network, the Internet, or some other suitable form of network. When a network data processing system takes the form of a “cloud”, the cloud also may provide applications for use by a user. For example, a user may work on a document at one client computer. The user may then send the document to the second computer to continue to work on a document there. The user does not need to select the client computer or make changes to the second client computer to access the document. The application used to work on the document is supplied by the cloud to the client computer at which the user is located.
With a cloud, client computers and other data processing systems access information stored in the cloud. As a result, a user may access information from different computers without having to carry the information in a portable storage device or send the information from one computer to another computer. For example, a user may work on documents and spreadsheets at one computer. The user may then travel to another location and access the same documents and spreadsheets at a second computer.
With clouds, data is often stored in different locations. For example, a document may have a hundred copies. If the document is backed up or archived, all one hundred of these copies are saved by the cloud. As a result, inefficient use of storage and other resources may occur by storing all of the copies.
One manner in which duplicate copies of data is managed is through the use of data de-duplication. Data de-duplication is a process for eliminating redundant data and reduce the amount of storage needed.
With data de-duplication, copies of data are reduced. Even redundant portions of files and other data can be removed. With this process, the extra copies of data that are removed are replaced with a reference to the copy of the data that is not removed.