The present invention relates to a bandwidth-efficient technique for performing data transfers between coherent and non-coherent memory spaces.
FIG. 1 is a block diagram of a modern computer system employing a multi-agent architecture. There, a plurality of agents 110-160 communicate over an external bus 170 according to a predetermined bus protocol. xe2x80x9cAgentsxe2x80x9d refer to any circuit that communicates over an external bus and may include general purpose processors, chipsets for memory and/or input output devices or other integrated circuits that process data requests. The agents 110-160 initiate bus transactions on the bus 170 to transfer data among one another.
The agents 110-160 may include internal caches (not shown) for the temporary storage of data. It is possible that two or more agents may store copies of the same data simultaneously. The agents 110-160 operate according to cache coherency rules to ensure that each agent (say, 110) uses the most current copy of the data available to the system. According to many cache coherency systems, each time an agent 110 stores a copy of data, it assigns to the copy a state indicating the agent""s rights to the data.
For example, the Pentium Pro(copyright) processor, commercially available from Intel Corporation, operates according to the xe2x80x9cMESIxe2x80x9d cache coherency scheme, identifying data as in Modified, Exclusive, Share, or Invalid state. Each copy of data stored in an agent 110 is assigned one of four states including:
Invalidxe2x80x94Although an agent 110 may have cached a copy of the data, the copy is unavailable to the agent. The agent 110 may neither read nor modify an invalid copy of data.
Sharedxe2x80x94The agent 110 stores a copy of data that is valid and possesses the same value as is stored in external memory. Copies of the data may be stored with other agents also in shared state. An agent 110 may only read data in shared state. An agent 110 may not modify data in shared state without first performing an external bus transaction to gain exclusive ownership of the data.
Exclusivexe2x80x94The agent 110 stores a copy of data that is valid and may possess the same value as is stored in external memory. When an agent 110 caches data in exclusive state, it may read and modify the data without cache coherency check via the external bus 170.
Modifiedxe2x80x94The agent 110 stores a copy of data that is valid and xe2x80x9cstale.xe2x80x9d A copy cached by the agent 110 is more current than the copy stored in external memory. When an agent 110 stores data in modified state, no other agents possess a valid copy of the data.
Before an agent 110 may operate on a copy of data, it must possess a copy of the data with a coherency state that is appropriate for the operation that it will perform. For example, to modify data, an agent 110 must possess a copy of data in either exclusive or modified state. Even if the agent possesses the data in shared state, the agent must issue a bus transaction that is observed by the other agents before it can advance the state of the data to the exclusive state. Agents 110-160 exchange cache coherency messages, called xe2x80x9csnoop responses,xe2x80x9d during the external bus transactions. Once an agent receives snoop responses from the other agents, the transaction has been xe2x80x9cglobally observedxe2x80x9d and the agent may advance the state of the data.
The Pentium Pro(copyright) processor has a linear 32-bit address space that permits direct addressing to 4 GB of memory. Data coherency techniques may extend to some or all data within this memory space (which, accordingly, may be called the xe2x80x9ccoherent memory spacexe2x80x9d). The Pentium Pro(copyright) also supports a xe2x80x9cpage size extensionxe2x80x9d mode that provides an effective 36-bit address space and extends memory access to up to 64 GB. Data coherency protection typically does not extend to this extended space, called the xe2x80x9cnon-coherent memory spacexe2x80x9d for purposes of this discussion. The Pentium Pro(copyright) processor typically xe2x80x9cworksxe2x80x9d in the coherent memory spacexe2x80x94most of the reading and writing of data to external memory is directed to the coherent memory space. When the processor requires access to data in the non-coherent memory space, it typically causes a page transfer to be made, swapping the data from the non-coherent space with some portion of data in the coherent memory space.
The page swap typically is managed by the processor and is done on a cache line by cache line basis. Conventionally, four bus transactions were required to swap each cache line in each of the pagesxe2x80x94a read/write pair to read a cache line from the coherent space to the non-coherent space and a second read/write pair to read the cache line from the non-coherent space to the coherent space. Because the processor itself manages the transfer, each of the bus transactions occurs on the external bus 170.
There are 128 cache lines in a page in a Pentium Pro(copyright) system. Thus, the process of swapping pages from coherent space to non-coherent space consumes a large amount of bandwidth on the external bus. Page swapping contributes to bus congestion, prevents the bus from fulfilling data requests from other agents and, accordingly, slows the system""s performance.
Accordingly, there is a need in the art for a page swapping technique that reduces bus congestion and improves system performance. There is a need in the art for a page swapping technique that reduces use of the external bus.
Embodiments of the present invention provide a computer system in which an agent, a Direct Memory Access (xe2x80x9cDMAxe2x80x9d) controller and a memory controller are provided each in communication with a bus. The DMA controller and the memory controller also can communicate with each other via a second communication path. The computer system may include a memory provided in communication with the memory controller having a coherent memory space and a non-coherent memory space. The DMA controller swaps a portion of data from the coherent memory space with a portion of data from the non-coherent memory space with a single transaction on the external bus.