The present invention generally relates to computer systems and methods and, more particularly, to computer systems and methods for memory state checking.
Distributed computer systems typically comprise multiple computers connected to each other by a communications network. In some distributed computer systems, the networked computers can concurrently access shared information, such as data and instructions. Such systems are sometimes known as parallel computers. If a large number of computers are networked, the distributed system is considered to be xe2x80x9cmassivelyxe2x80x9d parallel. As an advantage, massively parallel computers can solve complex computational problems in a reasonable amount of time.
In such systems, the memories of the computers are collectively known as a distributed shared memory. It is a problem to ensure that the information stored in the distributed shared memory are accessed in a coherent manner. Coherency, in part, means that only one computer can modify any part of the data at any one time, otherwise the state of the information would be nondeterministic.
FIG. 1 shows a typical distributed shared memory system 100 including a plurality of computers 110. Each computer 110 includes a uniprocessor 101, a memory 102, and input/output (I/O) interfaces 103 connected to each other by a bus 104. The computers are connected to each other by a network 120. Network 120 may be a local area network, a wide area network, to or a nationwide or international data transmission network, or the like, such as the Internet. Here, the memories 102 of the computers 110 constitute the shared memory.
Some distributed computer systems maintain data coherency using specialized control hardware. The control hardware may require modifications to the components of the system such as the processors, their caches, memories, buses, and the network. In many cases, the individual computers may need to be identical or similar in design, e.g., homogeneous.
As a result, hardware controlled shared memories are generally costly to implement. In addition, such systems may be difficult to scale. Scaling means that the same design can be used to conveniently build smaller or larger systems.
More recently, shared memory distributed systems have been configured using conventional workstations or PCS (Personal Communication System) connected by a conventional network as a heterogeneous distributed system. Shared memory distributed systems have also been configured as a cluster of symmetric multiprocessors (SMP).
In most existing distributed shared memory systems, logic of the virtual memory (paging) hardware typically signals if a process is attempting to access shared information which is not stored in the memory of the local SMP or local computer on which the process is executing. In the case where the information is not available locally, the functions of the page fault handlers are replaced by software routines which communicate messages with processes on remote processors.
With this approach, the main problem is that data coherency can only be provided at large (coarse) sized quantities, because typical virtual memory page units are 4K or 8K bytes. This size may be inconsistent with the much smaller sized data units accessed by many processes, for example, 32, 64 or 128 bytes. Having coarse page sized granularity increases network traffic and can degrade system performance.
In existing software distributed shared memory systems, fine grain information access and coherency control are typically provided by software-implemented message passing protocols. The protocols define how fixed size information blocks and coherency control information is communicated over the network. Procedures which activate the protocols can be called by xe2x80x9cmiss check code.xe2x80x9d The miss check code is added to the programs by an automated process.
States of the shared data can be maintained in state tables stored in memories of each processor or workstation. Prior to executing an access instruction, e.g., a load or a store instruction, the state table is examined by the miss check code to determine if the access is valid. If the access is valid, then the access instruction can execute, otherwise the protocols define the actions to be taken before the access instruction is executed. The actions can be performed by protocol functions called by the miss handling code.
The calls to the miss handling code can be inserted into the programs before every access instruction by an automated process known as instrumentation. Instrumentation can be performed on executable images of the programs.
FIG. 2 shows an example miss check code 200 for a program which is to execute on a RISC (Reduced Instruction Set Computer) type of computer. In this implementation, all of the memories of the distributed computers are partitioned so that the addresses of the shared memory are always higher than the addresses of the non-shared memory. In addition, the implementation maintains coherency state information for fixed size quantities of information, for example, xe2x80x9cblocksxe2x80x9d or xe2x80x9clines.xe2x80x9d Obviously, the fixed size or granularity of the blocks used by any particular application can be set to be smaller or larger than 128 bytes. Partitioning the addresses of shared memory, and using fixed blocks simplifies the miss check code, thereby reducing overhead. First, in step 201, the content of any registers that are going to be used by the miss check code 200 on a stack is saved. In step 202, the target address of the access instruction, using the offset and base as specified in the operands of the instruction is determined. The access instruction in this example is a store. A store access is valid if the processor modifying the data stored at the target address has exclusive ownership of the data.
In steps 203-204, a determination as to whether the target address is in non-shared memory is made. If the target address is in the non-shared memory, the rest of miss check code 200 is skipped, the registers are restored in step 231, and the memory access instruction is executed in step 232. In this case, the overhead is about seven instructions.
Otherwise, if the target address is in shared memory, then in step 205, the index of the line including the target address is determined. If the size of the block is an integer power of two, for example 128 bytes, the block index can be computed using a simple shift instruction.
As shown in step 206, the block index can be used to reference the corresponding entry of the state table. In the exemplary implementation, each entry in the state table is a byte. Obviously, if the number of different states is small, for example, the states can be indicated with two bits, then the size of the state table can be reduced. However, by making the entries smaller, it becomes more difficult to extract state information, since most computers do not conveniently deal with addressing schemes and data operations which are less than eight bits.
In step 207-208, the table entry is loaded, and in step 209, a determination is made as to whether the state of the block containing the target address is, for example, EXCLUSIVE. If the state of the block containing the target address is EXCLUSIVE, for example, the method skips step 220, and the registers from the stack are restored in step 231. In this case, the overhead is about 13 instructions. Otherwise, the miss handling code is called to gain exclusive control over the data in step 220.
As indicated above, code is inserted in the application executable to intercept each load and store of information to see if the information is available locally, is read only or is readable and writable. Numerous techniques have been employed to reduce the runtime overhead of these software checks as much as possible. Nonetheless, such systems still experience a 10-40% overhead due to these software state checks.
Thus, what is needed is a simple hardware support in a software distributed shared memory system that effectively eliminates the need for software state checks to reduce the associated overhead.
Software distributed shared memory systems have been previously proposed which provide hardware state checks at the granularity of a physical page such as 8K bytes by using one or more state bits associated with the page in a translation lookaside buffer (TLB). However, such systems do not provide state checks at a granularity less than a page such as block or lines or one or more state bits associated with the blocks or lines.
Non-distributed shared memory computer systems have also been previously proposed which provide finer grain protection or state checks by extending a TLB entry to have a single bit per 128-byte line in each page (i.e., 32 bits for a 4K byte page). However, these bits are primarily used for locking data at a finer granularity in database applications. Also, a single state bit per 128 bytes is not sufficient to efficiently provide read/write protection in applications such as software distributed shared memory systems. An example of such a computer system is described in U.S. Pat. No. 4,638,426, Virtual Memory Address Translation Mechanism with Controlled Data Persistence, by Albert Chang et al.
This patent, along with U.S. Pat. No. 4,589,092, Data Buffer Having Separate Lock Bit Storage Array, by Richard E. Matick, and U.S. Pat. No. 4,937,736, Memory Controller for Protected Memory with Automatic Access Granting Capability, by Albert Chang et al. describe a set of lock bits that control access to sub-regions of a page for locking in transaction-based programs.
U.S. Pat. No. 5,440,710, Emulation of Segment Bounds Checking Using Paging with Sub-page Validity, by David E. Richter et al., discloses the use of valid bits within a page. While the above mentioned patents primarily discuss the use of the bits as lock bits, this patent uses bits in the TLB to determine whether a region of data is valid or not to emulate segment bounds on hardware that does not directly support segments.
Accordingly, one object of the present invention is to reduce application executable runtime overhead associated with software state checks in a software distributed shared memory system.
It is another object of the invention to eliminate software state checks in software distributed shared memory systems.
It is yet another object of the invention to provide a method for checking block states for each block within a page of information.
It is yet another object of the invention to provide a method for checking block states for each block within a page of information when a request to access information in such blocks in physical memory is made.
It is yet another object of the invention to provide multi-granularity state checking such as fine grain block checking and coarse grain page checking.
Accordingly, the present invention provides a memory apparatus comprising memory for storing at least one page of information. For example, memory may include random access memory (RAM), dynamic random access memory (DRAM), or the like, and the information may include a fixed or variable number of bytes of data or instructions. The page of information is divided into at least two blocks of information. At least two block-state bits corresponding to each block are operably associated with the memory for providing at least three block states for each block. The block-state bits may be used to check or monitor the state of a block of memory or control access to a block of memory. The block states may, for example, indicate whether a block of information is present in memory, whether a block is only readable, whether a block is readable and writable, or whether a block has been previously accessed. A table is associated with the memory for containing addresses corresponding to the information stored in the memory. For example, the table may be a second memory such as a translation lookaside buffer. The block-state bits may be associated with or included in the table.
In accordance with one aspect of the invention, memory states are checked by maintaining, for example, storing, formatting or re-storing at least one page of information which is divided into at least two blocks of information in a memory, and maintaining at least two block-state bits corresponding to each block for providing at least three block states for each block. The block-state bits are read to check the states for each block.
According to another aspect of the invention, an information processing system having at least one processor and memory for storing at least one page of information is provided. The page of information is divided into at least two blocks of information. A communication link, such as a bus, operably couples the processor and the memory. At least two block-state bits corresponding to each block are operably associated with the memory for providing at least three block states for each block.
In yet another aspect of the invention, the above described inventive aspects are implemented in a distributed shared memory system comprising a first processor associated with a first memory and a second processor associated with a second memory. In such a system, the first and second processors share access to information stored in the first and second memories. The first processor and first memory may be included within a first workstation, and the second processor and second memory may be included in a second workstation. Or the first processor may be included in a first symmetric multiprocessor, and the second processor may be included in a second symmetric multiprocessor.
The present invention provides the advantage of simplifying complicated optimization protocols used to reduce overhead caused by inserted software state checks in software distributed shared memory systems by providing hardware bits which eliminate the need for the software checks.
The present invention also provides the advantage of leveraging exception and fast trap mechanisms already present for handling TLB faults in most computer systems.