This invention relates generally to memory management on computers, and more specifically to a method and apparatus for ensuring the integrity of data movement operations from virtual memory. The invention is operable in an environment in which data movement is performed largely by hardware rather than software, and is enabled responsive to monitoring and detection of Translation Lookaside Buffer (xe2x80x9cTLBxe2x80x9d) purges.
Data movement is an important xe2x80x9ccorexe2x80x9d function of systems, incorporated into many standard, regularly-used system operations such as messaging, data copying or clearing memory to zero. Data movement in systems typically involves three general steps. First, memory is allocated to enable the movement of the data. Second, the data movement itself is performed. Third, the system notifies appropriate components, such as processors or processor agents, that the data movement has completed successfully and processing can continue based on the new location of the data.
In systems of the current art, the first and third steps (memory allocation and notification) are typically performed by software, while the second step (data movement) is performed by hardware. The data movement hardware typically includes a message/copy state machine, an expensive hardware component whose operations are pivotal to enabling the data movement.
The software operations for the first and third steps of data movement (memory allocation and notification) inevitably require several machine cycles to complete. The software is typically found in microkernels loaded onto memory nodes that are local to the processor issuing the request requiring a data movement operation. It would thus be highly advantageous to be able to perform these first and third steps on hardware, obviating the need to refer to microkernel software, thereby speeding up the processing time to execute a data movement operation.
It will be appreciated, however, that hardware-driven data movement operations must also ensure the integrity of virtual-to-physical memory mapping while the operation is in progress. In a more software-driven data movement environment, this function would normally be performed by the processor hardware. There is therefore a need for a non-processor hardware-oriented mechanism to ensure the integrity of such mapping as part of hardware-driven data movement operations.
As used herein, xe2x80x9carchitecturexe2x80x9d means the way in which computer design, hardware and software interact in order to provide a planned level of capability and performance. As used herein, xe2x80x9carchitecture configurationxe2x80x9d means the topological layout of the physical structure of a computer""s internal operations, including its processors, registers, memory, instruction set and input/output resources, as designed to enable a particular predetermined architecture.
The claimed invention operates in an architecture in which data movement in systems is optimized by performing operations integral to data movement, such as memory allocation and notification, with hardware rather than software.
As a result, many system operations involving data movement are correspondingly also optimized. Internodal messaging is a good example. It is common in systems having globally shared memory to allow a microkernel resident on one memory node to send messages to microkernels resident on other memory nodes. Where data movement involves memory allocation and notification steps performed by software, however, processor efficiency usually dictates that these messages be restricted in length to a single cache line length. Four common cache line lengths used in the art today are 16 bytes, 32 bytes, 64 bytes, and 128 bytes. Messages from one microkernel to another microkernel typically need to be significantly longer than these fixed single cache line lengths, however. A restriction holding messages to 32 bytes in length, for example, therefore places significant overhead burden on the operating system to limit messages to multiple 32-byte xe2x80x9ccontainers.xe2x80x9d This overhead burden inevitably causes performance degradation.
Data movement under architecture as disclosed herein, however, allows contiguous messages of unrestricted length to be sent from one node to another in multiple cache lines. The overall message length is specified by a completion status that is posted by the sending node when the operation is complete. This type of unrestricted messaging is enabled by empowering the message/copy state machine to perform memory allocation and notification operations as well as data movement operations. With the restriction on internodal messaging lifted, the system is freed of the overhead burden. In freeing the system of this overhead, therefore, system processing efficiency may be leveraged far in excess of the actual efficiency achieved at the physical data movement level.
Data copying is an example of a system operation involving data movement that is optimized by the claimed invention. Optimization is particularly enhanced in operations involving data copying from a virtual page in memory. Virtual pages must first be translated to physical pages. While the data copy operation is being issued and executed, however, other components of the system, running concurrently, may change the physical mapping relied upon to translate the virtual page to the physical page. In systems of the current art, monitoring of this mapping to maintain translation accuracy is performed by processor hardware. In a preferred embodiment of the claimed invention, this monitoring is additionally performed by non-processor hardware. Changes to the mapping generate a translation lookaside buffer (xe2x80x9cTLBxe2x80x9d) purge, the occurrence of which is monitored and detected by a mechanism disclosed herein. When a TLB purge is detected, the mechanism stops the data copy operation and enqueues status information regarding the point at which data transfer stopped, thereby ensuring forward progress from that point once accurate mapping is re-established and data transfer re-starts.
Pipelining is an example of a design optimization in which utilization of the pipelined functions is increased. System operation involving data movement may be optimized by utilizing pipelining as disclosed herein. In creating a xe2x80x9cpipeline,xe2x80x9d a series of functions or operations is set up to be executed concurrently, consecutively, or overlapping, as predefined. Individual cycles or instructions of different pipelined operations are executed together to give the overall effect of simultaneous processing of all pipelined operations. Pipelining such as disclosed herein enhances an architecture by being available to concurrently execute individual data movement instructions as they come down the pipeline. It will thus be appreciated that the pipeline may potentially present a stream of data movement instructions (e.g. allocate memory, move data, notify) each taken from various concurrently pipelined data movement operations. Data movement hardware (such as the message/copy state machine) may thus be put to almost continuous use, and to capacity, xe2x80x9cpicking and choosingxe2x80x9d among data movement instructions from various pipelined operations as they come down the pipeline. It will be seen that the overall effect is to optimize the pipeline by speeding up pipelined data movement instructions on an almost continual basis.
In a preferred embodiment, the architecture configuration in which the claimed invention operates comprises a processor agent having (1) first input registers receiving data from a first processor and second input registers receiving input from a second processor; (2) a Message/Copy State Machine receiving control information from the input registers; (3) a data mover; (4) a means for pipelining discrete data movement operations in an overlapping environment; and (5) a first status queue receiving control information from the message/copy state machine to be dispensed to the first processor and a second status queue receiving control information from the message/copy state machine to be dispensed to the second processor. The architecture configuration of a preferred embodiment further includes a memory access controller in communication with the processor agent, the memory access controller having (1) a means, responsive to control information from the message/copy state machine, for allocating memory to enable an operation; (2) a message allocation state machine also operating responsive to control information from the message/copy state machine; and (3) a message completion status queue also operating responsive to control information from the message/copy state machine.
It is therefore a technical advantage of the claimed invention to optimize data movement operations by enabling a hardware-based alarm system for detecting and responding to TLB purges.
It is a further technical advantage of the claimed invention to ensure the integrity of virtual-to-physical memory mapping during operations such as data movement when such operations are performed largely by hardware. This integrity is ensured by enabling a hardware-based alarm system for detecting and responding to TLB purges which may affect data movement operations.
The foregoing has outlined rather broadly the features and technical advantages of the claimed invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the claimed invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.