1. Field of the Invention
The present invention generally relates to processing of PCI interrupt signals in logically partitioned systems.
2. Description of the Related Art
In logically partitioned computer systems, multiple distinct individual operating systems generally share a single computer hardware structure. Although the various operating systems share a single supporting hardware structure, the individual operating systems may run simultaneously and autonomously within their respective partitioned environments. In order to control and/or manage the multitude of operating systems in the various partitioned environments, a single global software system and/or firmware component, which may be termed a hypervisor, is generally utilized. The hypervisor is generally configured to manage and/or control the allocation/use of the resources available on the single computer hardware system by each of the respective operating systems. For example, the hypervisor may control resource access and allocation for the overall computer system data storage mediums, access to the available system CPUs, and/or any of the system input/output (IO) device adapters, along with other known features of computer systems. The hypervisor may be further configured to insure that the respective individual partitions are unaware of the existence of each other and do not interfere with their respective operations. In this type of logically partitioned system using a hypervisor, it is desirable to allow different partitions to share the respective IO buses, while maintaining exclusive ownership/control over the individual IO device adapters in communication with the IO buses.
A peripheral component interconnect (PCI) bus is an industry standard IO bus that is widely used in computer systems ranging from typical desktop systems running Microsoft Windows-type operating systems to open-source Linux based systems and large mainframe class computers, such as the IBM AS400/iSeries and RS6000/pSeries computers running OS400 and AIX, respectively. The PCI IO bus implemented in these types of systems is generally in communication with a computer system central processing unit (CPU) and/or memory complex via a PCI host bridge (PHB). Additionally, the PCI IO bus is generally in communication with a plurality of PCI-type devices. The specific types of IO devices may include IO device adapters that are configured to connect a given IO device type, i.e., a disk-type storage medium, a tape drive, or a network connection, for example, to the PCI bus. Further, a PCI bus may be in communication with additional PCI and/or other IO buses via a PCI bridge device in a structured tree-type hierarchy configuration, which may allow for substantial IO device density expansion. The additional PCI buses are generally positioned below the PHB and may be in communication with the root PCI bus. Therefore, in view of this configuration, the PHB may generally operate as an originating or root PCI bus of a tree of PCI devices and/or a combination of PCI and/or other IO buses. In smaller computer systems there may be only one or two PHBs connected to the memory bus, whereas larger more complex computer systems, such as those that are generally capable of logical partitioning-type functions, may have upwards of several hundred PHBs connected to the computer memory bus. The management of this quantity of PHBs and the accompanying IO devices in communication therewith presents substantial obstacles for the hypervisor.
In many PCI system implementations, such as personal computers (PCs), RS6000 configurations, and/or other workstation class computer systems, a firmware component of the system determines the configuration of the PCI buses and accompanying devices. The firmware may determine the configuration from the system boot programming and/or the initial program load (IPL) process and supporting code. In communications within the system after initialization, the firmware determined system configuration may be transmitted to the operating system and various software/hardware applications subsequently loaded by the firmware so that these applications can utilize the system""s determined configuration in their respective initializations. Firmware may be a limited set of hardware configuration programs that reside in a read only-type memory and operate to initialize the computer hardware. The hardware initialization process is generally configured to facilitate the subsequent loading of the complete operating system software, which is generally stored on a readable/writable storage medium. Once the complete operating system is loaded, system control is generally transferred to the complete operating system from the firmware. For example, in a typical PC workstation or server configuration the firmware is referred to as the basic IO subsystem (BIOS) and is considered to be a component of the computer hardware that the computer manufacturer provides. Alternatively, operating system software may be provided by a third party, such as Microsoft Windows or DOS, IBM 0S2, or Linux, for example.
In a logically partitioned computer system, a hypervisor program is generally an extension of the basic hardware control firmware of the computer. The hypervisor may operate to manage the partitioning of hardware resources between different operating systems that may be executing within the individual logical partitions. Therefore, in a logically partitioned system the hypervisor program may be thought of as the analog to the PC BIOS or firmware of non-partitioned-type computer systems. The hypervisor may also be configured to initialize only the hardware structure and/or elements that are assigned to particular partitions for those partitions. The computer hardware generally includes programs in a read only-type memory that encapsulates the details of the underlying computer hardware interconnections and control mechanisms. The combination of the hardware interconnections and control mechanisms enable the operating system to view the respective hardware elements as a collection of more abstract hardware elements that then appear similar for various underlying computer hardware implementations. The operating system also commonly includes PCI bus initialization programs that call these programs in the read only memory in order to activate specific hardware controls without direct knowledge of the specific underlying computer hardware interconnections and control mechanisms of the particular computer.
In a computer system using PCI buses, a device driver program executing within a logical partition generally controls the IO devices connected to the PCI bus. The PCI bus implementation generally includes the capability to generate a hardware interrupt signal from a PCI device in order to allow that device to alert the device driver program of a change in the state of the device. For example, an interrupt may be generated upon the completion of a command received from the device driver or at the occurrence of an event that is within the supervisory functions of the device driver. Generally accepted PCI standards specify interrupt signals that are output pins from a PCI device to its PCI bus connection. However, the accepted standards do not specify how these device interrupt outputs are connected to the other hardware elements of the computer system that effect the device driver program interrupt. For example, an interrupt controller that receives a PCI interrupt output signal from a device and causes a CPU interrupt may not be specified in current accepted standards. Generally, the computer hardware provides some form of interrupt controller that receives hardware interrupt outputs from other hardware components, including PCI devices, and generates a CPU interrupt indicating the particular hardware component that signaled the interrupt condition. The Intel Open Programmable Interrupt Controller is an example of an interrupt controller that is in wide use in PC, Workstation, and PC server class computers and that commonly provides 8 or 16 such interrupt inputs for receiving interrupts from PCI devices.
In smaller computer systems having only a few PCI devices implemented, a relatively small number of interrupt controller input connections is generally adequate to connect essentially every PCI device interrupt output to an unique interrupt controller input. In cases where the number of interrupt outputs from PCI devices exceeds the number of interrupt controller inputs, then multiple interrupt outputs may be connected to a single interrupt controller input. Therefore, in this configuration, when any one PCI device signals an interrupt, the device drivers for each of the devices connected to that interrupt controller input are generally interrupted in order to enable the controller to determine which device generated the interrupt signal.
Inasmuch as the particular hardware mechanisms generating program interrupts may differ from system to system, it is normally a function of the computer BIOS, firmware, and/or hypervisor to determine the specific association of an interrupt output signal from a PCI device with the respective CPU interrupt mechanism. It is also the function of the firmware, :e., either the BIOS, firmware, or the hypervisor, to determine if more than one device interrupt output is connected to or shares a given interrupt controller input. Further, these firmware entities generally provide the ability to associate multiple device drivers with a shared interrupt controller input when the interrupt controller signals an interrupt condition on that particular interrupt input. Additionally, the firmware entities will generally insure that once an interrupt condition has been received and device drivers associated with that particular interrupt signal have been invoked, then any additional interrupt conditions for that same interrupt controller input are not signaled to the CPU until the first interrupt is cleared. As such, the CPU will not generally receive information about a subsequent interrupt on the same controller input until all device drivers have completed the processing of the current interrupt signal.
This type of interrupt sharing has become common in PCs, workstations, and PC servers as the number of PCI devices connected to a single computer system has grown beyond the capabilities of conventional interrupt controllers. Further, operating systems for these computers, particularly Microsoft Windows and Linux, have commonly implemented a software list or queue of device driver interrupt programs, generally termed an Interrupt Request Queue (IRQ). Each of the entries in the IRQ are generally associated with a given interrupt controller input. Therefore, the computer firmware may operate to provide a program call interface that the operating system may utilize to inhibit or enable interrupt signals to the CPU. This program call interface may be implemented within the firmware in a manner specific to the hardware interrupt blocking mechanisms of that particular computer system.
However, prior to any processing of interrupts or assignment of IRQs, the software or firmware must generally initialize. This initialization process generally includes a device detection process that is designed to determine specific parameters of each device in communication with the respective partitions and associate specific software programs and/or routines with the respective devices. Therefore, when an interrupt signal is generated by a particular device, the software has the information necessary to route the generated interrupt to the appropriate software program and/or routine.
In order to facilitate the software""s detection and initialization of PCI buses and devices in a logically partitioned system, for example, each PCI device and/or PCI bridge generally provides a standardized configuration space. This configuration space may be composed of one or more register facilities, either internal or external, that are generally accessible to the software through PCI configuration read and write operations. The software may conduct configuration read and write operations using a mechanism in the PCI host bridge hardware that effects a PCI configuration read or write operation on that particular PCI bus and any subsequent PCI buses positioned lower in the tree structure. The PHB generally includes a process for the software to specify the intended configuration address, which generally consists of a bus number and device number, in addition to the configuration space register address. The PCI bus number of the configuration address is generally be specified in the PCI standard to be 8 bits and to identify a PCI bus within the PCI bus tree formed under a particular PHB. Therefore, the PCI bus number in a PCI configuration address generally operates to uniquely select one PCI bus in that PCI bus tree. Further, generally no more than 256 individual PCI buses may be connected within the scope of a particular PHB bus domain. The PCI device number may be formed by a 5-bit identification select parameter (ID select or IDSEL) that identifies and/or selects a device connection on a PCI bus and a 3-bit function number that identifies one of up to 8 independent PCI adapter functions in certain multifunction device (MFD) classes of PCI devices.
Therefore, in order to detect the presence of a particular device, the computer software may perform a PCI read operation to the PCI vendor ID register at a particular configuration address on the root bus. If a device exists at that specific address, the software may return data from the register specified in the read operation. If no device exists at the specified address, then the configuration read operation may terminate. Using this technique, in order to completely determine which configuration addresses are connected to PCI devices, the software may iteratively read the configuration vendor ID register at each possible configuration address on the root bus. This type of process is generally referred to as a xe2x80x9cbus walkxe2x80x9d type operation. Furthermore, in the situation where additional bridges are detected on a particular PCI bus, then the bus walk for the particular bridge may be repeated on the PCI bus created by that bridge for all possible configuration addresses on that subordinate PCI bus. Therefore, in sum, when an additional branch in the tree is discovered, the software iterates through each sub-branch of the newly discovered branch looking for devices.
The PCI configuration space of a device may further include registers that allow the system software to establish certain PCI bus operating parameters for that particular device. In particular, the PCI bus number of PCI buses created by PCI to PCI bridges, as well as that of the root bus created by the PHB, may be contained in configuration space registers that the system software may set with PCI configuration write operations. Therefore, the system software may determine the bus number of each PCI bus in a PCI bus tree, including that of the PHB root bus. The primary requirement is that the bus numbers of PCI buses at lower branches in the tree be numerically higher than that of the bus to which its PCI bridge is connected, but otherwise each PCI bus number, including that of the root bus, may be any value in the range 0 to 255 (although, 0 is useful only for the root bus created by a PHB, as any PCI buses created below that generally have a PCI bus number greater than that of the root bus).
Inasmuch as it simplifies the computer manufacturer firmware or BIOS and standardizes the manner in which an operating system detects PCI bus configuration on varying computer hardware designs, it is preferred within the industry that the system firmware or operating system software perform some form of PCI bus walk, and that this software xe2x80x9cconfigurexe2x80x9d the operating parameters of PCI devices on that bus. Using BIOS or firmware calls, an operating system can determine the presence and identity of individual PHBs and perform the PCI configuration read and write operations necessary to detect and configure these devices for operating system and device driver use. In some computer implementations, such as the IBM RS6000/pSeries, for example, the system firmware or BIOS performs PCI bus configuration and communicates the results to the operating system with a main storage data structure (commonly called a xe2x80x9cdevice treexe2x80x9d). However, in many implementations, such as most versions of Linux, the operating system itself configures the PCI buses using configuration read and write operations directly to those devices.
In a logically partitioned system, generally only some of the PCI devices in the system are available to the operating system in a particular logical partition. However, it is desired to configure logical partitioning to allow the PCI devices assigned to individual partitions to be connected to the same PCI bus, so that only the devices themselves are owned exclusively by one particular partition but that any of the PCI buses connecting them are effectively shared by other logical partitions. To facilitate this sharing, the hypervisor may determine the configuration of all possible PCI buses and devices in the overall computer system and establish the operating parameters of those PCI buses that are shared by multiple partitions. Additionally, it may also be the function of the hypervisor to insure that the operating system and other programs in one logical partition do not perform PCI bus operations of any type to PCI devices that are not assigned to that partition. These functions generally require that the hypervisor set the PCI bus numbers of all PCI buses within the overall computer system in order to insure that the actions of an operating system in one logical partition are not visible or disruptive to the operations of another logical partition using PCI devices on the same PCI bus or buses.
Smaller computers using the PCI bus provide generally only one or two PHBs and a limited number of PCI device connections. In these types of systems, the number of PCI buses that may be created across all possible PCI bus trees is sufficiently small enough that the 8 bit PCI bus number for any one PCI bus may also uniquely identify that bus within the overall computer system and not just within the PCI bus tree formed under a particular PHB. Consequently, it is common practice for the operating systems of small computers, such as Linux, Microsoft Windows, arid 32-bit versions of AIX to uniquely number every PCI bus system wide using only the 8-bit PCI bus number applied in configuration read or write operations on particular PCI bus trees, even though that bus number needs to be unique only within that particular tree. Thus, operating systems designed for smaller computers generally do not themselves implement methods to uniquely identify more than 256 PCI buses within a system, nor do they generally specify more than 256 unique PCI buses for purposes of configuration read or write operations.
In contrast, larger computer systems may connect several hundred PHBs, and therefore, may configure many more than the 256 PCI buses that can be represented with a single 8-bit PCI bus number. In particular, for purposes of electrical and bus protocol isolation between individual PCI devices, the IBM iSeries and pSeries computers, for example, provide each PCI device connection to PCI adapter card through a PCI-PCI bridge on the PHB bus. Therefore, in these systems each PHB bus is capable of connecting as many as ten PCI adapter cards and upwards of 128 PHBs. As a result thereof, the base hardware configuration may include more than a thousand individual PCI buses. Since it is the practice of some PCI device manufacturers to use a PCI-PCI bridge within a PCI adapter card as a device to integrate multiple PCI devices behind a single PCI connection to the computer""s PCI buses, additional PCI buses are created with the use of these classes of PCI adapter cards.
Additionally, the hypervisor and the partition operating systems may use different forms of physical identification for PCI device and functions numbers within the overall computer system. In particular, the hypervisor may modify the configuration device number of a PCI bus device as it is known to an operating system in a logical partition, as a means to protect the identify and access to other devices on that PCI bus that are assigned to other logical partitions. As a consequence of this, the partition operating system may specify a configuration address device number that is not the same as or equivalent to the hypervisor or physical device number used during configuration operations to a particular device.
The operating system in a particular partition generally determines the configuration of PCI buses and devices available to it from within the possibly larger configuration of PCI buses and devices forming the overall computer system and used among different logical partitions. It is desired in logical partitioning to permit operating systems in a particular logical partition to require little, if any, change in design to operate within a logical partition, so that the hypervisor provides program call interfaces similar to or duplicating the program call interfaces of the system firmware or BIOS of a non-partitioned computer. Thus, the operating system of a logical partition can perform its normal PCI bus walk functions unmodified from its original form and still operate to control the substantially larger number of devices in the tree.
With the initialization process aside, when the computer CPU accepts an interrupt signal from the interrupt controller, it generally enters into a state in which it temporarily inhibits or ignores other interrupt conditions and delivers program control to the operating system. The operating system may then call the firmware to inhibit additional interrupts from the interrupt controller input generating the interrupt signal. The operating system then invokes each device driver program that is associated through the interrupt request queue with that interrupt controller input. When all device driver interrupt programs have completed, the operating system again calls the firmware to re-enable the interrupt controller to transmit interrupts from the various interrupt sources.
Logically partitioned computer systems commonly include several hundred PCI bus trees and several thousand PCI devices. In this type of configuration, it is impractical and limiting to system performance to provide an interrupt controllers for all of the devices. Rather, such computer systems generally provide a mechanism for receiving 8, 16, or 32 interrupt inputs per PCI bus tree, such that the total number of interrupt controller sources across all PCI buses in the overall computer system may be also on the order of thousands.
The operating systems for small computers, such as Windows and Linux, expect that the underlying computer hardware has a limited number of interrupt controller inputs. Therefore, these smaller computer-based operating systems are designed with a relatively small number of interrupt request queues to associate with each interrupt controller input. Linux, for example, provides for up to 256 interrupt request queues. Although these types of small computer operating systems are also useful as an operating system of one partition of a logically partitioned system, they are not designed the very large number of interrupt sources generated by the hardware of larger computer systems and networks.
Furthermore, with regard to the internal facilities of the PCI devices themselves, such as, for example, control or status registers and data buffers, these registers and buffers are generally associated with a range of contiguous PCI memory and IO addresses on a particular PCI bus. A PHB or PCI bridge generally includes facilities to define the range of addresses that encompass each of these physical spaces. Basic communications between the computer system CPU and PCI devices, between PCI devices connected to a PCI bus, or within the same PCI bus tree generally include read and/or write commands that then specify PCI memory or IO addresses correlating to these PCI device internal facilities. Each PCI device that requires PCI memory or IO space to map its internal facilities generally has up to six registers that can be read or written by other configuration-type operations of the PCI bus to indicate the size and address alignment required for that space, and to allow software in the computer system to dynamically, at each device boot, assign the specific range of PCI memory or IO space associated with that space.
In order to further enable computer programs to control or monitor a PCI IO device adapter and/or to transfer data to or from this adapter, the PCI memory or IO space addresses of the PCI IO device adapter internal facilities are generally in turn associated with particular address ranges on the computer memory bus. This allows the computer CPU instructions to read and/or write these IO device adapter facilities as if they were a segment of the computer memory. The collective PCI memory and IO addresses of all PCI devices beneath a PHB are generally correlated one-to-one with contiguous address ranges on the computer memory bus, with the general requirement that each PHB address range be unique from others on the memory bus and that they are unique relative to real memory address ranges on the memory bus. Within a PCI bus tree the memory and IO space address ranges of each device, or each bus created by a PCI bridge are generally contiguous ranges within a larger memory or IO space address range than the PCI bus above it. Thus, each branch in a PCI bus tree may use a segment of the larger PCI memory and IO address ranges assigned to the branch above it.
On a PCI bus, generally PCI IO addresses may be either 16 or 20 bits in length, correlating to a 64 KB or 1 MB space, respectively. Because the PHB generally defines the overall IO space for all IO addresses in the tree beneath it, all IO addresses are generally limited to the range defined or implemented by the PHB. Similarly, on a PCI bus, PCI memory addresses may be either 32 or 64 bits, and all PCI memory addresses in the tree beneath a PHB are generally limited to the range defined for that PHB. These ranges may be determined in part by the capabilities of the PHB or PCI bridge that creates a secondary PCI bus in the tree from a PCI bus closer to the root bus, and in part by the software or firmware that allocates memory bus addresses to PHBs so that they do not overlap each other or the real memory implemented in that computer.
In order to determine the size of the PCI memory and IO space required in each PCI bus device tree, along with which memory bus address ranges should then be assigned to each PHB and PCI IO device adapter in those trees, system software generally must first determine the configuration of each PCI bus tree. In many computer systems xe2x80x9csystem firmwarexe2x80x9d or xe2x80x9cboot softwarexe2x80x9d determines the configuration of PCI buses and devices included in the computer system at system boot or IPL time.
The hardware structure of computer systems capable of logical partitioning commonly have multiple CPUs, very large main storage, and many IO buses and IO device adapters, such as the AS400/iSeries computer which is capable of configuring upwards of 128 to 256 IO buses. Such systems typically have large memory bus addresses, which may be on the order of 40 to 64 bits in length. The hypervisor of a logically partitioned system generally assigns memory bus address ranges to each root PCI bus (PHB) from within the overall memory bus addressing capabilities and such that they do not duplicate address ranges occupied by real memory. In a computer system with a large memory bus address, the hypervisor may assign a full 32-bit or even a somewhat larger, address range to each PHB, and the PCI bus memory or IO space address ranges assigned to IO device adapters on one PCI bus may then duplicate the memory or IO addresses for other PCI buses.
In contrast, small computers such as PCs and workstations typically implement only one or two IO buses connected to the CPU/Memory complex and often implement only 32-bit memory bus addresses. Computer operating systems designed for 32-bit computer systems, such as Windows, 0S2, Linux, and AIX, however, can represent only 32-bit PCI memory or IO space addresses, both of which generally must be defined uniquely within the overall 32-bit address space defined for real memory and PCI addresses. These operating systems assign PCI bus and IO device adapter memory and IO space ranges such that the addresses assigned to IO device adapters on one PCI bus do not duplicate those of IO device adapters on other PCI buses within the same computer.
Therefore, in view of conventional systems, there is a need for a method for determining the connection of interrupts for PCI devices available to a particular operating system in a logically partitioned system. Additionally, there is a need for a method to enable and disable the signaling of interrupts from a PCI device using a hypervisor program to assure that the operating system of a logically partitioned computer system correctly receives the interrupt signals from devices assigned thereto. Further, there is a need for a hypervisor capable of enabling and disabling the signaling of interrupts associated with the PCI devices assigned to thereto without affecting the signaling of interrupts for PCI devices assigned to other logical partitions.
Further still there is a need for a method for conducting PCI bus and device detection in a logically partitioned system. In particular, there is a need for a method in which a hypervisor of a logically partitioned system may represent the PCI device number of a device, as this number is known to the operating system of the logical partition, differently than the actual PCI device number used to perform configuration read and write operations to that device and/or differently than it is known to the hypervisor. Further still, there is a need for a method through which PCI initialization functions of an operating system in a logically partitioned system may build a translation between the PCI bus and device numbers known to the operating system and those that may be known to the hypervisor or those used to perform configuration read and/or write operations.
Further still, there is a need for a method that allows a hypervisor of a logically partitioned computer system to determine the PCI device configuration of all PCI bus trees in the overall computer system and to choose PCI memory and IO space addresses for the PCI devices in these trees without regard to the number or types of operating systems in the computer system.
Embodiments of the present invention generally provide a method for PCI interrupt routing in a logically partitioned guest operating system, wherein the method may include the steps of locating primary PCI buses, initializing IRQ tables, locating hardware interrupt sources, and activating an interrupt handling process.
Embodiments of the present invention further provide a computer readable medium storing a software program that, when executed by a processor, may cause the processor to perform a method for PCI interrupt routing in a logically partitioned guest operating system. The method for conducting PCI interrupt routing in a logically partitioned guest operating system may include locating primary PCI buses, initializing IRQ tables, locating hardware interrupt sources, and activating an interrupt handling process.
Embodiments of the present invention further provide a method for routing PCI interrupts in a logically partitioned guest operating system. The method for routing PCI interrupts may include determining the primary PCI buses available to the guest operating system, generating an IRQ table configured to map an interrupt controller structure for each system IRQ, executing a hardware interrupt source location routine, and activating an interrupt handling process.
Embodiments of the present invention may further provide a computer readable medium storing a software program that, when executed by a processor, may cause the processor to perform a method for routing PCI interrupts in a logically partitioned guest operating system. The method for routing PCI interrupts may include determining the primary PCI buses available to the guest operating system, generating an IRQ table configured to map an interrupt controller structure for each system IRQ, executing a hardware interrupt source location routine, and activating an interrupt handling process.