Field of the Invention
The present invention generally relates to memory copy operations in a computing environment, and, more particularly, to replicated stateless copy engines.
Description of the Related Art
In some compute environments, memory copy operations are efficiently performed by dedicated units known as copy engines (CEs). A host central processing unit (CPU) offloads memory copy operations to one or more CEs by sending instructions to the CEs, where the instructions include the address of a source block of memory, and a destination address to where the memory block is copied. In computing applications that involve extensive memory copy operations, performance is increased by delegating memory copy operations to the CEs, leaving the host CPU available to perform other tasks. If a system includes multiple CEs, then the host CPU may program the multiple CEs to enable several memory copy operations to be processed concurrently, further improving performance.
Typically, multiple application programs may be executing in the host CPU simultaneously. In such cases, the multiple application programs may be sending memory copy operations to the same set of CEs. In order to transparently share one or more CEs among multiple application programs, CEs may employ a technique known as context switching. During a context switch, the current context, or state, of the CEs, associated with the current application program, is stored so that the context may be restored at a later time when the current application program resumes execution. A new context, associated with a new application program, is then restored, allowing the new application to begin execution. Context switching enables multiple application programs to share one or more CEs. With context switching, the host CPU informs a CE when a new application program is using the CE and provides the CE with information regarding where the context for the new application program is stored. The CE then finishes any work currently in process, saves the context for the application program currently using the CE, and fetches from memory the context associated with the new application program. Once completed, the CE is then ready to accept memory copy operations associated with the new application context.
One problem with the above approach is that the logic needed to perform context switching is typically large and complex. Further, this context switching logic is not related to the CEs primary function, but rather is dedicated to context switching. CEs that support context switching often have a significant amount of surface area devoted to logic that supports the context switching function. Another problem with the above approach is that the steps to save a current context and load a new context involve reading and writing to a memory subsystem which may have long access times. Typically, saving the current context and loading the new context is performed after the memory copy operation for the new context is ready to begin, resulting in a delay of the memory copy operation. The time to perform context switching leaves less time available for the CEs to accomplish the actual memory copy operation, reducing overall performance.
As the foregoing illustrates, what is needed in the art is an improved approach for performing memory copy operations in a compute environment.