Data deduplication has become an important storage technology as organizations look to improve storage utilization and reduce costs. Deduplication works by looking for duplicate chunks (regions) of data—if, for example, two regions of data are identical, one region can be replaced with a pointer to the other. Deduplication helps to reduce storage requirements by reducing the amount of storage needed to store files that contain identical regions. Examples of where deduplication is useful include virtual machine images, online gaming applications that store game context for multiple users, audio and video clips that have overlapping regions served from media servers, desktops served from a common server to users in an enterprise, and medical imaging.
Conventionally, the focus of deduplication has been limited to storage features. Hence, deduplication does not address the problem of reducing memory requirements when identical regions are used by multiple applications, which results in duplicates appearing in the page cache. When a shared region is used by an application, process, or other type of user, the file system can check for duplicates and use the data in the page cache to populate the user buffer. However, the problem with this approach is that the same region is copied repeatedly into the user buffer, and so memory is not saved. Alternatively, many applications may attempt to memory map files to reduce storage requirements, but memory mapping of files is closely tied to the page cache mechanism, and hence the page cache needs to be made aware of duplicate pages.