Computer users, including individuals as well as organizations, often require the capability to backup their confidential and/or proprietary data. There are many business needs for these backup capabilities. For example, backups are needed to ensure that data is not lost even if computers that hold data were to become infected with a computer virus or a worm that destroys data, or if hard drives that hold data were to fail, or if computers that hold data were to be destroyed or stolen. Thus, there is a significant need for computer users to perform frequent backups of important, confidential data. Some examples of important, confidential data include medical case histories, information associated with legal representations, and financial information.
In recent years, backup services have been offered to clients by storage providers over the Internet. These services are typically based on the following model: encryption of data locally on the client's computer, then transfer of the encrypted data over the Internet to the provider for backup. The provider may have a large server farm to take advantage of economies of scale that may result from reduced cost of storage on a large scale, and from amortized operational costs (such as costs of security, electricity, rent, and maintenance). The provider may therefore be able to offer backup services to clients at a significant savings compared to the cost that would be incurred by clients were they to perform their own data backup.
To attain these cost savings, clients provide their confidential data to third-party providers, which raises the issue of the safety of the client data while in the provider's possession. Providers may claim that due to their internal policies and procedures, client data uploaded to their server farms cannot be accessed by other clients and cannot be used by provider employees for any illegitimate purpose. Providers may also point to the encryption of client data (such as filenames and file contents), where the provider does not know the decryption key, as making the client data undecipherable to the provider, and to the use of approaches such as message authentication codes as protecting against tampering with the client data.
However, even if client filenames and file contents are encrypted, important information can still be revealed to any party with access to a provider's servers in the form of statistics associated with a client's data access pattern. For example, an employee of the provider could observe variations in the number of encrypted data blocks (such as files or database records) retrieved by a client per time interval (such as per day or per hour). The employee of the provider could also observe the frequency with which any individual encrypted data block is accessed by the client, and learn over time how different frequencies of use correlate with this client's externally observable behavior. The employee of the provider could provide this information to another party that could use this information to predict real-world actions to be taken by the client in the future. For example, the data access pattern of a small company could signal whether a product under development is nearing final release, and the data access pattern of a small law firm could signal when documents in a case are being reviewed just prior to filing.
It is against this background that a need arose to develop the apparatus, system, and method described herein.