1. Technical Field
The present invention relates generally to maintaining the compliance of data in a network and more particularly to encrypting data in a network memory architecture.
2. Description of Related Art
To allow remote employees access to an enterprise's information systems, organizations typically choose between two networking approaches: centralized servers or distributed servers. Centralized server implementations have the advantage of simplicity since an information technology (IT) professional centrally manages, maintains, and enforces policies for the organization's data.
An issue that arises in allowing remote access to information is that unauthorized users may also gain access to the organization's data. Additionally, legislation in the United States and individual states requires that certain information is encrypted and/or make the organization civilly liable for injuries resulting from data breaches. Two examples of federal legislation requiring compliance include the Health Insurance Portability and Accountability Act (HIPAA) and the Sarbanes-Oxley Act. To secure the data and memory against theft, viruses, and hackers, the data is encrypted using an algorithm such as Advanced Encryption Scheme (AES), Data Encryption Scheme (DES), or Triple DES. However, two issues arise when encrypting data on a network. First, encryption can negatively affect performance. Second, when not encrypted, data is still vulnerable to unauthorized use.
Many organizations select the distributed server implementation to mitigate some of the problems with the centralized server implementation. FIG. 1 illustrates a distributed server system 100 in the prior art. The distributed server system 100 includes a branch office 110, a central office 120, and a communication network 130. The communication network 130 forms a wide area network (WAN) between the branch office 110 and the central office 120.
In the distributed server system 100, the branch servers 140 (e.g., email servers, file servers and databases) are placed locally in the branch office 110, rather than solely in the central office 120. The branch servers 140 typically store all or part of the organization's data. The branch servers 140 generally provide improved application performance and data access for the computers 160. The branch servers 140 respond to a request for the organization's data from the local data. For each request for the data, the central servers 170 potentially do not need to transfer the data over the communication network 130 (i.e., the WAN), via router 180 and router 150. Synchronization and backup procedures are implemented to maintain the coherency between the local data in the branch office 110 and the data in the central office 120.
Unfortunately, managing the distributed server system 100 is complex and costly. From a physical point of view, the distributed server system 100 with one hundred branch offices requires an order of one hundred times more equipment than a centralized server approach. Each piece of the equipment not only needs to be purchased, but also installed, managed, and repaired, driving significant life cycle costs. The branch office 110 may need additional local IT personnel to perform operations because of this “Server Sprawl”. Furthermore, the multiplication of managed devices means additional license costs, security vulnerabilities, and patching activities.
In distributed server implementations (e.g., the distributed server system 100), the data, including the “golden copy” or most up-to-date version of mission critical data, is often stored (at least temporarily) only on the branch servers 140 in the branch office 110. Organizations implement complex protocols and procedures for replication and synchronization to ensure that the mission critical data is backed up and kept in-sync across the WAN with the central servers 170.
Security vulnerabilities are a particular problem in providing compliance to the distributed server system 100. As the “golden copy” is stored on a local server and backed up locally, this computer or storage may be stolen, infected with viruses, or otherwise compromised. Having multiple servers also increases the overall exposure of the system to security breaches. Additionally, locally encrypting the data or the system further complicates the replication and synchronization of central servers 170 and decreases performance. Therefore, data in a distributed server implementation is vulnerable and maintaining compliance can be difficult.
FIG. 2 illustrates a centralized server system 200 in the prior art. The centralized server system 200 includes a branch office 210 and a central office 220 coupled by a communication network 230. The communication network 130 forms a WAN between the branch office 210 and the central office 220.
Typically, the central servers 260 in the central office 220 store the organization's data. Computers 240 make requests for the data from the central servers 260 over the communication network 230. The central servers 260 then return the data to the computers 240 over the communication network 230. Typically, the central servers 260 are not encrypted. The central servers 260 are usually maintained in a secure location such as a locked building requiring a hand scan or an iris scan for entry to prevent theft of the hard disks on which data is stored. This is a more secure system because the computers 240 contain only a small amount of unencrypted data that can be breached if, for example, the computer is stolen, resold, or infected by a virus.
The communication network 230 typically comprises a private network (e.g., a leased line network) or a public network (e.g., the Internet). The connections to the communication network 230 from the branch office 210 and the central office 220 typically cause a bandwidth bottleneck for exchanging the data over the communication network 230. The exchange of the data between the branch office 210 and the central office 220, in the aggregate, will usually be limited to the bandwidth of the slowest link in the communication network 230.
For example, the router 250 connects to the communication network 230 by a T1 line, which provides a bandwidth of approximately 1.544 Megabits/second (Mbps). The router 270 connects to the communication network 230 by a T3 line, which provides a bandwidth of approximately 45 Megabits/second (Mbps). Even though the communication network 230 may provide an internal bandwidth greater than 1.544 Mbps or 45 Mbps, the available bandwidth between the branch office 210 and the central office 220 is limited to the bandwidth of 1.544 Mbps (i.e., the T1 connection). Connections with higher bandwidth to relieve the bandwidth bottleneck across the communication network 230 are available, but are generally expensive and have limited availability.
Moreover, many applications do not perform well over the communication network 230 due to the limited available bandwidth. Developers generally optimize the applications for performance over a local area network (LAN) which typically provides a bandwidth between 10 Mbps to Gigabit/second (Gbps) speeds. The developers of the applications assume small latency and high bandwidth across the LAN between the applications and the data. However, the latency across the communication network 130 typically will be 100 times that across the LAN, and the bandwidth of the communication network 230 will be 1/100th of the LAN.
Furthermore, although FIG. 1 and FIG. 2 illustrate a single branch office and a single central office, multiple branch offices and multiple central offices exacerbate the previously discussed problems. For example, in a centralized server implementation having multiple branches, computers in each of the multiple branch offices make requests over the WAN to central servers for the organization's data. The data transmitted by the central servers in response to the requests quickly saturate the available bandwidth of the central office's connection to the communication network, further decreasing application performance and data access at the multiple branch offices. In a distributed server implementation having multiple branches, the cost to provide branch servers in each of the multiple branch offices increases, as well as the problems of licensing, security vulnerabilities, patching activities, and data replication and synchronization. Moreover, different branches may simultaneously attempt to modify the same piece of information. Maintaining coherency in a distributed implementation requires complex and error prone protocols.
As well as implementing centralized servers or distributed servers, organizations also implement mechanisms for caching to improve application performance and data access. A cache is generally used to reduce the latency of the communication network (e.g., communication network 230) forming the WAN (i.e., because the request is satisfied from the local cache) and to reduce network traffic over the WAN (i.e., because responses are local, the amount of bandwidth used is reduced).
Web caching, for example, is the caching of web documents (i.e., HTML pages, images, etc.) in order to reduce web site access times and bandwidth usage. Web caching typically stores unencrypted local copies of the requested web documents. The web cache satisfies subsequent requests for the web documents if the requests meet certain predetermined conditions.
One problem with web caching is that the web cache is typically only effective for rarely modified static web documents. For dynamic documents, there is a difficult tradeoff between minimizing network traffic and the risk of the web cache serving up stale data. The web cache may serve stale data because the web cache responds to requests without consulting the server.
Another problem is that the web cache does not recognize that two otherwise identical documents are the same if they have a different Uniform Resource Locator (URL). The web cache does not consider the content or context of the documents. Thus, the web cache caches the documents by URL or filename without a determination of the content or context of the document. Moreover, the web cache stores entire objects (such as documents) and cache-hits are binary: either a perfect match or a miss. Even where only small changes are made to the documents, the web cache does not use the cached copy of the documents to reduce network traffic.