1. Field of the Invention
The present invention relates to a processing system and method for converting contents in a memory from one format to another; more particularly, a system and method for converting compressed contents to uncompressed format (morphing) or vice versa while concurrently removing the operating system and regularity applications.
2. Discussion of Related Art
In a paged operating system, the virtual address space, namely, the collection of addresses addressable by a program, is divided into pages, with collections of contiguous virtual addresses having fixed lengths. Typically a page contains 4 Kb. The virtual address space of a program is in general much larger than the available physical memory. The operating system provides a set of functionalities supporting this feature, functionalities that are collectively referred to as virtual memory manager. To support virtual address spaces larger than the physical memory, virtual memory managers stores virtual pages both in memory and on tertiary store, usually hard disks. When a virtual page is accessed, and is not in main memory, it is read from disk (page-in operation). If there is no available physical space for the page being read from disk, another virtual page is written to disk (page-out operation) and its space is released. When a virtual page is read from disk, it is assigned a starting real address, namely, an address as seen from the processor. The real memory (the address space of the processor) is divided into a collection of contiguous and pairwise disjoint real address ranges, having the same size as a logical page. These are called page frames. Hence, when a logical page is read from memory, it is stored within a page frame. The translation between logical and real pages relies on a directory structure divided into pages called page tables. Each logical page has a unique entry in a page table, called page table entry, which contains the starting real address of the page frame containing the page, or the position on disk, if the logical page is on tertiary store. Free page frames are managed using a separate data structure.
The set of page frames used by processes (including those of the OS) is managed by appropriate modules of the operating system. Most operating systems provide virtual memory management, namely, offer each process an address space which is commonly significantly larger than the available physical memory. To accomplish this, the operating system maintains only a fraction of the pages of each process in memory, and stores the others on mass storage, such as hard disks. Hence a physical page, which is a set of contiguous physical addresses, can contain a virtual page of a process, or can be temporarily unused. A physical page is commonly called a page frame. When a process issues an operation on a page which is not in memory, the page is copied from disk into an unused page frame (similarly, if the page is a new one, that is, it is not stored on disk, an unused page frame is allocated to it). A page frame can be unused for at least three reasons: (1) because it has never been used since the machine was last started; (2) because the process that last used it has terminated; and (3) because the operating system frees it. In the last case, the operating system is also responsible to ensure that a copy of the content of the page frame to be freed is present on disk. Usually, mechanisms exist to detect if the content of the page frame has been modified since it has been allocated or copied from disk. If the page frame is unchanged, there is no need to copy it back. If the page frame content has been modified, it must be copied to disk, otherwise there is no need to do so.
An emerging development in computer organization is the use of data compression in the main memory of a computer system. The data in the main memory is stored in a compressed format.
FIG. 1 depicts an exemplary processing system having compressed contents in memory. In FIG. 1, a central processing unit (CPU) 102 reads data to and writes data from a cache 104. Cache misses and stores results it reads from and writes to a compressed main memory 10 by means of a compression controller 106. The real memory, namely, the set of processor addresses that correspond to data stored in memory, is typically divided into a number of pairwise disjoint segments corresponding to a fixed number of contiguous processor addresses. Pairwise disjoint means that each real address belongs to one and only one such segments. These segments are referred to as memory lines. Memory lines are the unit of compression. A memory line stored in the compressed memory is compressed and stored in a variable number of memory locations, which depends on how well its content compresses.
U.S. Pat. Nos. 5,761,536 and 5,729,228 disclose computer systems where the contents of main memory are compressed.
Referring again to FIG. 1, the compressed memory is divided into two parts: a data portion 108 and a directory 107. The data portion is divided into pairwise disjoint sectors, which are fixed-size intervals of physical memory locations. For example, a sector may consist of 256 physical bytes having contiguous physical addresses. The content of a compressed memory line is stored in the minimum possible number of physical sectors. The physical sectors containing a compressed line need not have contiguous physical addresses, and can be located anywhere within the data portion of the compressed main memory. The translation between the real address of byte and the address of the physical sector containing it is performed via the directory 107.
FIG. 2 shows further details to better understand the operation of the compressed memory. The processor cache 240 contains uncompressed cache lines 241 and a cache directory 242, which stores the real address of each cache line. In the following discussion, an assumption is made that a cache line has the same size as a memory line (the unit of compression). Upon a cache miss, the cache requests the corresponding line from memory, by providing real address 270 that caused the miss. The real address is divided into two parts: the log2(line length) least significant bits are the offset of the address within the line, where log2( ) is the logarithm in base 2. The other bits are used as index in the compressed memory directory 220, which contains a line entry for each line in the supported real address range. Address A1 (271) corresponds to line entry 1 (221), address A2 (272) corresponds to line entry 2 (222), address A3 (273) corresponds to line entry 3 (513) and address A4 (274) corresponds to line entry 4 (514), and so on. Different addresses are used in the example to show different ways of storing compressed data in the compressed main memory. In this illustration, the line having address A1 compresses very well (for example, a line consisting of all zeros). Such line is stored entirely in the directory entry 221, and does not require memory sectors. The line at address A2 compresses less well, and requires two memory sectors 231 and 232, which are stored in the data section 230. Line entry 222 contains pointers to the memory sectors 231 and 232. Note that the last part of memory sector 232 is unused. The line having address A3 requires 3 memory sectors, 233, 234 and 235. The space left unused in sector 235 is large enough to store part of the compressed line having real address A4, which in turn uses sector 236 and part of 235. The lines at addresses A4 and A3 are called roommates. The compressor is used when so called dirty lines (e.g., lines previously used) in the cache are written back into memory. Upon a cache writeback, a dirty line is compressed. If it fits in the same amount of memory it used before the writeback, it is stored in place. Otherwise, its is written in the appropriate number of sectors. If the number of required sectors decreases, the unused sectors are added to a free-sector list. If the number of required sectors increases, they are retrieved from the free-sector list.
FIG. 3 shows possible organizations of the entries in the compression directory 220. The figure illustrates three different line organizations. Entry 1 (306) contains a set of flags (301), and the addresses of 4 sectors. If the line size is 1024 bytes, and the memory sector size is 256, the line requires at most 4 sectors. Entry 2 (307) contains a set of flags, the address of the first sector used by the line, the beginning of the compressed line, and the address of the last sector used by the line. If the line requires more than 2 memory sectors, the sectors are connected by a linked list of pointers (namely, each memory sector contains the address of the subsequent one). Entry 3 contains a set of flags, and a highly compressed line, which compresses to 120 bits or less. The flags in the example are flag 302, indicating whether the line is stored in compressed format or uncompressed, flag 303 indicating if the line is highly compressible and is stored entirely in the directory entry, flag 304 (2 bits) indicating how many sectors the line uses, flag 305 (4 bits), containing the fragment information), namely what portion of the last used sector is occupied by the line (this information is used for roommating).
The maximum compression ratio achievable in a system with memory compression that relies on the described compressed-memory organization depends on the size of the directory: the maximum number of real addresses is equal to the number of directory entries in the directory. Limiting the size of the directory to yield, say, a 2:1 compression is suboptimal for most computer applications, where higher compression ratios are usually observed. On the other hand, a large directory occupies a substantial amount of physical memory, which can impair the system performance if the content of memory happens to be poorly compressible. The memory compression schemes described in the art have a directory size which is fixed when the machine is booted, and cannot be changed while the machine operates.
The cost of compressing and decompressing (i.e., the latency) is partially hidden by the cache. A large cache almost entirely hides these latencies for most typical workloads. However, for non-cache-friendly workloads, that do not have a strong locality of memory references, the cache cannot hide the latencies, and the performance of a system with memory compression is significantly worse than that of an analogous system without memory compression. If the characteristics of the workload are known a-priori, the memory compression scheme described in the art allow the computer system to be started and operate in uncompressed mode (as a standard computer, where real addresses correspond to physical address). However, if the machine is started in uncompressed mode, it cannot be converted back to a compressed mode without restarting it, and vice versa.
When memory compression is used in a paged memory system, the number of page frames that can be used by processes varies dynamically. The page frames that can be used by processes are referred to herein as usable page frames. If the compressibility of the data increases, the number of usable page frames can be increased. Similarly, if the compressibility drops, more page frames can be made unavailable.
In a computer system where the content of main memory is kept in compressed format, the translation between a real address as produced by the processor and the physical address of the memory cells containing the compressed data is performed using a directory, referred to herein as compressed-translation table (CTT). Data is compressed and stored into memory upon cache write-backs. Upon cache misses, the content of memory is decompressed. The latency of the decompression process are hidden by using a large cache memory.
When the memory contains poorly compressible data, the number of different page frames in memory (the size of the real memory) can be smaller than the number of physical pages, and the performance of the compressed-memory system might be lower than that of a traditional system having the same amount of physical memory, due to an increase in page faults. When the workload is cache-unfriendly, namely, when it causes a large number of cache misses, the cache does not hide the decompression latency quite as well, and the performance of the system supporting memory compression suffers. If cache-unfriendly workloads are run for long periods of time, the reduced performance of the system becomes visible to the user.
The above examples of cases where running the system in traditional mode with the content of memory uncompressed and without the additional cost of real-to-physical translation can be beneficial. The hardware of systems supporting memory compression therefore can also operate in traditional uncompressed mode. Typically, the decision of whether to run the system in compressed-memory mode or in traditional mode is based on knowledge of the intended workload or of the data. Once the decision is taken, the system runs in compressed-memory or uncompressed-memory mode until the next time it is rebooted: the mode of operation cannot be changed while the system is computing. A need therefore exists for a system and method for switching the mode of operation from compressed-memory to uncompressed-memory or vice versa without CTT, does not require either rebooting the system or halting operation of applications, or capable of dynamically changing the size of the compressed-memory directory.