1. Field of the Invention
This invention relates to a computer and, more particularly, to a bus interface unit which concurrently dispatches memory and input/output (xe2x80x9cI/Oxe2x80x9d) request cycles to respective target devices and maintains proper ordering of data sent to and returned from the memory and I/O target devices.
2. Description of the Related Art
Modem computers are called upon to execute instructions and transfer data at increasingly higher rates. Many computers employ CPUs which operate at clocking rates exceeding several hundred MHz, and further have multiple buses connected between the CPUs and numerous input/output devices. The buses may have dissimilar protocols depending on which devices they link. For example, a CPU local bus connected directly to the CPU preferably transfers data at a faster rate than a peripheral bus connected to slower input/output devices. A mezzanine bus may be used to connect devices arranged between the CPU local bus and the peripheral bus. The peripheral bus can be classified as, for example, an industry standard architecture (xe2x80x9cISAxe2x80x9d) bus, an enhanced ISA (xe2x80x9cEISAxe2x80x9d) bus or a microchannel bus. The mezzanine bus can be classified as, for example, a peripheral component interconnect (xe2x80x9cPCIxe2x80x9d) bus to which higher speed input/output devices can be connected.
Coupled between the various buses are bus interface units. According to somewhat known terminology, the bus interface unit coupled between the CPU bus and the PCI bus is often termed the xe2x80x9cnorth bridgexe2x80x9d. Similarly, the bus interface unit between the PCI bus and the peripheral bus is often termed the xe2x80x9csouth bridgexe2x80x9d.
The north bridge, henceforth termed a bus interface unit, serves to link specific buses within the hierarchical bus architecture. Preferably, the bus interface unit couples data, address and control signals forwarded between the CPU local bus, the PCI bus and the memory bus. Accordingly, the bus interface unit may include various buffers and/or controllers situated at the interface of each bus linked by the interface unit. In addition, the bus interface unit may receive data from a dedicated graphics bus, and therefore may include an advanced graphics port (xe2x80x9cAGPxe2x80x9d). As a host device, the bus interface unit may be called upon to support both the PCI portion of the AGP (or graphics-dedicated transfers associated with PCI, henceforth is referred to as a graphics controller interface, or xe2x80x9cGCIxe2x80x9d), as well as AGP extensions to the PCI protocol.
There are numerous tasks performed by the bus interface unit. For example, the bus interface unit must orchestrate timing differences between a faster CPU (processor) local bus and a slower mezzanine bus, such as a PCI bus or a graphics-dedicated bus (e.g., an AGP bus). In addition, the bus interface unit may be called upon to maintain time-sensitive relationships established within the pipelined architecture of a processor bus. If data attributable to a request forwarded across the processor bus is dependent on data of a previous request, then the timing relationship between those requests must be maintained. In other words, timing of requests which occur during a request phase of the pipeline must be maintained when data is transferred during a later, data transfer phase of the pipeline in order to ensure coherency of the pipelined information.
A stalling mechanism is sometimes employed to account for timing differences between a slower peripheral bus and a faster processor or memory bus. Stall cycles can therefore occur within a particular phase of the processor bus pipeline, and particularly in the snoop phase. Modem processor buses, such as the a Pentium(copyright) Pro bus employes numerous phases: arbitration, request, error, snoop, response, and data transfer.
Stalling, however, does not by itself draw one transaction ahead of another in the pipeline of the processor bus. A deferral mechanism is therefore used for the purpose of allowing a more critical transaction to proceed to completion through the various phases ahead of an earlier-placed transaction (i.e., a transaction placed into the pipeline ahead of the more critical transaction). The transaction being deferred is therefore said to be set aside in favor of a transaction which needs to be serviced quickly.
For example, in an attempt to immediately service requests to faster local memory (i.e., system memory of substantially contiguous semiconductor memory space), modern processor bus architecture allow memory request cycles to be completed upon the processor bus ahead of cycles to the peripheral bus. This means that peripheral-destined cycles which may be snoop stalled are deferred to allow faster, memory-destined cycles to be drawn from the in-order queue of the pipeline ahead of the slower, deferred peripheral-destined cycles. The deferred cycle must, however, be re-initiated at a later time beginning at the first phase (i.e., arbitration phase) of the processor pipeline. Many clock cycles are then needed to again place the deferred transaction back into the snoop phase. Associated with each deferral, a processor bus clocking penalty must be paid for each deferral operation.
An advantage arises if the number of snoop stall cycles and deferred cycles can be minimized. A bus interface unit which can possibly forward memory request cycles without having to snoop stall immediately preceding peripheral request cycles would be a significant improvement to the conventional snoop stall routine. The benefit of dispatching memory requests as soon as possible, and dispatching peripheral requests whenever the peripheral bus or peripheral data is available, proves advantageous as a tool for optimizing the processor bus bandwidth and memory accesses. A bus interface unit which can minimize snoop stall without necessarily having to pay the burdensome penalty of cycle deferral would pose an important advancement over conventional bus interface unit architecture.
The problems outlined above are in large part solved by an improved bus interface unit hereof. The present bus interface unit can dispatch memory-destined request cycles (memory request cycles) concurrent with peripheral-destined request cycles (peripheral request cycles). In this manner, peripheral request cycles can be immediately sent if the peripheral bus is clear or peripheral data is available. Also important is the benefit of transferring a memory request cycle to system memory so that the processor optimally receives instructions or data stored therein.
The memory bus which receives memory requests or data from the bus interface unit is one which is compatible with high speed semiconductor memory. Examples of suitable memory include: DRAM, synchronous DRAM (SDRAM). A graphics-dedicated bus may also be coupled to the bus interface unit. If the graphics bus is an AGP-PCI bus, then it may be linked to the bus interface unit by an AGP interface to effectuate (e.g., 66 MHz 1xc3x97AGP transfers or 133 MHz 2xc3x97AGP data transfers). The bus interface unit maintains a PCI interface which is synchronous to the processor interface and supports PCI burst cycles. The graphics bus or mezzanine bus coupled to the bus interface unit may interchangeably be termed a xe2x80x9cperipheral busxe2x80x9d. The term peripheral bus is generic in its application to any bus on which a peripheral device such as an electronic display, disk drive, printer, network interface card, SCSI, etc. can be coupled. Thus, a peripheral device generically involves an input/output device which is accessed within the input/output address space.
The present bus interface unit is configured as a north bridge between a processor local bus, a peripheral bus, and a memory bus. The processor bus can link at least one, and certainly more, processors and associate cache storage locations within those processors. Additionally, the memory bus links a memory controller within the bus interface unit to system memory denoted as semiconductor memory. To expedite transfers between the various buses, the bus interface unit includes a processor controller, a memory controller, and a peripheral controller. The processor controller is coupled to the processor bus, the memory controller is coupled to the memory bus, and the peripheral controller is coupled to the peripheral bus (i.e., PCI bus or AGP bus). Coupled between the various controllers within the bus interface unit are address and data queues. Depending on where the address or data originates, and the point of destination, a specific address or data queue is optimally present within that respective transfer path.
Attributed to the processor controller is a peripheral request queue and a memory request queue. The peripheral request queue stores certain information relating to a peripheral request (i.e., a request to the peripheral bus and specifically a peripheral device connected to the peripheral bus). Likewise, the memory request queue stores information specific to memory requests destined for the memory bus or memory device. Requests within the memory request queue are stored in the order in which they are received. Likewise, requests to the peripheral request queue are stored in the order in which they are received. According to one embodiment, each request queue may be a circular first-in-first-out (xe2x80x9cFIFOxe2x80x9d) circular buffer, or may have input and output points which indicate the input location or xe2x80x9cheadxe2x80x9d of a series of filled locations and an output location or xe2x80x9ctailxe2x80x9d which tags the culmination of the series of filled locations. Among information stored within the peripheral and memory request queues are addresses as well as the type of request being solicited (i.e., whether the request is to memory or a peripheral device, or is a read request or a write request). An entry number may be used within the memory request queue to resolve coherency with a snoop result to cache. According to another embodiment, entry numbers may also be associated with the peripheral request queue to note the relative order in which requests are placed within each queue and among both queues if, for example, the requests are placed in the respective queues out-of-order. The entry numbers need not be employed in the peripheral request queue (and memory request queue) if the requests are issued to the respective queues in-order and maintained in-order within respective memory and peripheral data queues containing the responses to the respective requests.
Given the example where the requests are issued out-of-order, the first request forwarded from the processor bus may be destined for the peripheral bus. This means that request will be routed to the peripheral request queue and given an entry number 0. The next request may be destined for memory and will be placed in the memory request queue along with an entry number 1. The entry number, or tag, associated with each request is sent along with the address as well as the type of request being sent (e.g., whether the request is a read request or a write request). That information is presented to the respective address and data queues of the bus interface unit based on its entry number. This implies that the earliest entry number within the memory request queue will be de-queued before later entry numbers concurrent with earlier entry number de-queued within the peripheral request queue before later entry numbers within the peripheral request queue.
Given an example where the requests are issued in-order, the first request is maintained in order within the peripheral request queue, while the second, third and fourth request issued to the memory request queue are maintained in order therein. Furthermore, the second, third and four data transfer results (i.e., read or write data) are maintained in the same order within the memory data queue. The output pointer within an in-order queue ensures the peripheral data will be drawn from the peripheral data queue before data is drawn from the memory data queue. The output pointer is incremented to allow the next (i.e., third and fourth) request results to be drawn in order from the memory data queue. If the requests and corresponding results within respective memory and peripheral queues are forwarded and maintained in order, then simpler logic associated with the input and output points of an in-order queue can be beneficially employed to resolve order of read data returned to the processor or write data to the memory or peripheral device. However, if out-of-order requests are sent, possibly due to multiple requesters being used, then the more elaborate entry number and tagging scheme may be used.
The in-order queue maintains either an input/output pointer system or entry numbers depending on whether the requests and corresponding read/write data are sent in-order or out-of-order. If a pointer system is used, the output pointer keeps track of which data queue location is to forward data next (i.e., whether data will be pulled from the output pointer location or head of memory data queue M2P or P2M, or whether data is pulled from the output pointer location or head or peripheral data queue I2P or P2I). If entry numbers are used, the entry numbers are identical to the entry numbers which are present in the peripheral and memory request queues. The entry numbers stored in the in-order queue serve to memorialize the order in which the requests are forwarded from the processor bus to either the peripheral request queue or the memory request queue. In this fashion, the in-order queue makes note of the request order so that when data is to be forwarded either from the memory or peripheral device (or to the memory or peripheral device), that data will be presented across the processor bus in a specific sequential fashion. The in-order queue thereby beneficially maintains the data order across the processor bus based on the previous request order. In this manner, the critical timing of data transfers relative to earlier requests is properly maintained within the processor pipeline to ensure coherency.
According to one embodiment, a computer is provided. The computer includes a processor controller having both a memory request queue and a peripheral request queue. The memory request queue stores a sequence of memory requests and the peripheral request queue stores a sequence of peripheral requests, both of which are eventually sent to either a memory or peripheral target. The peripheral device is therefore coupled to receive the peripheral request. Depending on its use or type, the peripheral device can be arranged on a printed circuit board outside of, or exclusive of, a board on which the processor controller is configured.
According to another embodiment, the processor controller may include a decoder which decodes a series of bits within each of the memory and peripheral requests to identify the memory request as destined exclusively for the memory request queue and to identify the peripheral request as destined exclusively for the peripheral request queue. Thus, the decoded series of bits relates to bits either within the peripheral address space or the memory address space. Another set of bits denotes the entry order at which the peripheral and memory requests enter their respective queues. The entry order is noted as a tag which follows along with its respective address to define each request (peripheral or memory request) relative to one another in the sequence at which they are dispatched from the processor bus. The in-order queue also stores the entry number to ensure subsequent data is sent across the processor bus in an order defined by the order in which the request were earlier sent across the processor bus.
According to another embodiment, the use of entry order bits or tags is avoided. As such, the requests and corresponding data within each of the peripheral or data queues are maintained in order. Resolution between data from the peripheral or memory data queues is achieved by simply implementing a FIFO output, or output pointers, indicating whether data is removed from the peripheral data queue or the memory data queue corresponding to the ordering or previously issued requests.
A bus interface unit is preferably provided within the computer. The bus interface unit is configured between a processor bus, a peripheral bus, and a memory bus. The bus interface unit includes an in-order queue coupled to store an order in which a plurality of requests are dispatched from the processor bus to either the peripheral bus or the memory bus. A peripheral request queue is coupled to store peripheral addresses associated with a first set of the plurality of requests destined exclusively for the peripheral bus. A memory request queue is coupled to store memory addresses associated with a second set of the plurality of requests destined exclusively for the memory bus. A comparator may be included and coupled between a pointer associated with the in-order queue and a pointer associated with data queues. The comparator is configured to dispatch the peripheral data and the memory data across the processor bus commensurate with the order in which the plurality of earlier-dispatched requests were stored in the in-order queue. More specifically, the comparator determines the relative position of the pointer attributed to the in-order queue. Based on that position, the comparator determines the next data to be sent from a queue having data resulting from that request. Once a match to data is ascertained, based on where the pointer resides in the in-order queue, that data is then forwarded across the processor bus (either as read data to the processor or as write data from the processor). In this manner, the current status of the pointer and the entry numbers stored within the pointer establish proper ordering of data subsequently forwarded across the processor bus even though requests may be sent to target devices out-of-order from requests earlier sent across the processor bus. Instances in which the requests are sent out-of-order occur due to peripheral requests and memory requests being sent concurrently, where one type of request is not delayed based on the other. As an alternative to the comparator, more simplistic logic can be implemented merely to pull data from the respective memory or peripheral data queues based on the order of requests maintained within the in-order queue. Avoidance of the comparator assumes requests are issued in-order and maintained in-order within respective data queues.
A method is also presented, according to another embodiment. The method includes steps for sending a plurality of requests across the processor bus and subsequently sending data across the processor bus according to the order in which the requests were previously sent. The steps involve loading memory requests of the plurality of requests destined for a memory device into a memory request queue and possibly assigning a first tag identifying the order in which the memory requests are sent across the processor bus. Peripheral requests of the plurality of requests destined for a peripheral device are loaded into a peripheral request queue and assigned a second tag identifying the order in which the peripheral requests are sent across the processor bus. While the memory requests and peripheral requests are loaded, the first and second tags are also loaded into an in-order queue to identify the order in which the memory requests are loaded relative to one another as well as the order in which the memory requests are loaded relative to the peripheral requests. Memory data and peripheral data can then be accessed corresponding to respective memory requests and peripheral requests. The first tag is assigned to corresponding memory data and the second tag is assigned to corresponding peripheral data. The first tag within the memory data can be compared to the previously sent first tag within the memory requests, while the second tag within the peripheral data can be compared to the previously sent second tag within the peripheral requests. The comparison yields an arrangement or sequence at which the memory and peripheral data can then be sent across the processor bus. In this fashion, the sequence of peripheral and memory data sent across the processor bus is ordered relative to peripheral and memory requests previously sent across the processor bus. Thus, if memory address 1 attributed to memory request 1 occurs before peripheral address 2 associated with peripheral request 2, then the memory data attributed to memory request 1 is sent across the processor bus before the peripheral data corresponding to the peripheral request.