The present invention relates to a method and apparatus for establishing and controlling disk drive access by processors in a computer system and, in particular, to an improved method for establishing and controlling access to multiple disk devices by multiple processors in a multi-processor/multi-disk drive system.
Many current computer systems employ a multi-processor configuration comprised of two or more processor units interconnected by a bus system and capable of independent or cooperative operation, thereby increasing the total system processing capability and allowing the concurrent execution of multiple related or separate tasks by assigning each task to one or more processor. Such systems typically also include a plurality of mass storage units, such as disk drive devices, to provide adequate storage capacity for the number of tasks executing on the system, to reduce the average access latency by spreading the information across multiple disk devices, and to minimize interference between tasks in accessing the mass storage units by assigning, in so far as possible, portions of a disk device or a set of disk devices to each currently active task or processor on the system.
A recurring problem with multi-processor/multi-disk device systems, however, is in reducing communication interference between the processors and the mass storage units. This problem becomes particularly acute when, for example, the tasks concurrently executing on the system must share data, as when one task must provide data to another task, or when two or more concurrent tasks must share the storage space of a disk device unit because of storage space limitations or access latency limitations.
For example, in current NUMA multi-processor/multi-disk device systems each disk device is associated with and connected from a specific subset of processors, or processor complex, through an adapter associated with the processor complex and the processor complexes communicated through a relatively slow inter-processor bus. As such, a read or write operation by a task to any disk device other than a disk device connected to the task""s own processor was performed through the inter-processor bus and handled as an interrupt by the processor of the target disk device unit, resulting in a significant loss of speed.
This problem is partially solved in a multi-processor/multi-disk device system, wherein each processor complex and its associated adapter was connected to all the disk devices, often but not necessarily through a switch. Each processor complex could therefore communicate directly with each disk device, so that processor/disk device communications were not required to pass through the inter-processor bus. The processor operating system programs, however, typically identified each disk device associated with a processor by a xe2x80x9cnamexe2x80x9d and, while the names were unique within the context of each processor complex, each disk device would have multiple names in the processor""s operating system programs. This would lead the operating system programs to believe there were more disk devices than actually existed, and could result in corruption of the disk data as the operating system programs would treat the different xe2x80x9cnamesxe2x80x9d as separate disk devices. As a consequence, an additional operating system program was used to translate local disk device names into xe2x80x9cglobalxe2x80x9d disk devices names such that all higher level programs would see one and only one xe2x80x9cnamexe2x80x9d for each actual disk device.
In current multi-processor/multi-disk device systems, such as the quad processor Intel(trademark)O SMP system, multiple processors are interconnected through a high speed bus and each disk device adapter is connected between the associated disk device and the bus, rather than to the processors or to an individual processor. These systems eliminate the lower speed interprocessor bus problem and allow each processor to directly address each adapter and its associated disk device through the interprocessor bus without requiring connections from all adapters to all disk devices. The disadvantage to such systems, however, is that each processor typically includes a cache and, because each processor can communicate with each disk device, each cache is required to contain information pertaining to all of the adapters/disk devices. In normal operation, therefore, a significant proportion of the processor and inter-processor bus capacity is consumed with cache update operations as the adapter/disk device information is accessed and updated by many different processors.
Many systems, however, such as the Intel quad processor SMP server system, can reduce the number of cache updates by means of a system utility that allows each processor to be bound to and service interrupts for only a single disk device adapter. Such utilities, while reducing the cache update traffic on the completion of the disk operation, that is, the interrupt processing, do not restrict the initiation of disk operations. The initiation of disk operations can, therefore, result in interprocessor cache traffic. Additional utilities, which allow the construction of dedicated processor/disk device pairs for the initiation of disk operations would significantly reduce the volume of cache update operations because each processor is required to maintain cache information with respect to only one disk device or set of disk devices. This approach has the obvious disadvantage, however, of constraining each processor to accessing only a single disk device or set of disk devices, so that the sharing of data or of disk device space among processors requires complex operations among the processors.
While a switch may be incorporated into such a system to allow each disk device adapter to be connected to a plurality of disk devices and a corresponding plurality of processor/disk device adapter pair assignments to be made for each processor, this method of overcoming the single processor/single disk device limitation is unsatisfactory because of the resulting problems in disk device identification and, consequently, in managing the contents of each disk device and processor cache. In a system having four processors, four disk device adapters, and four disk devices, for example, where each disk device was directly accessible to each of the four disk device adapters, and hence all processors, each disk device would be identified by four names, each of which would be optimal only to a single processor. Additional steps must be taken to ensure that data written to one disk device, through any of its paths or names, is consistent. It is therefore apparent that the inclusion of a switch to enable each disk device to be associated with a plurality of disk devices significantly increases the complexity of managing the contents of the disk devices and the processor caches and greatly increases the possibilities for error.
The present invention provides a solution to these and other problems of the prior art.
The present invention is directed to an improved method and apparatus for providing access between the processors and the mass storage devices of a computer system having a plurality of processors and a plurality of mass storage devices, an interprocessor bus interconnecting the processors, and a plurality of adapters connected from the interprocessor bus for providing communication between the processors and the mass storage devices. The system will also include a binding utility for communicating with the processors and the adapters to generate pairings between the processors and the adapters wherein each processor/adapter pairing is an association of a processor with an adapter.
According to the present invention, a switch is connected between the adapters and the mass storage devices for connecting each adapter to each mass storage device. A binding mapper communicates and operates with the binding utility at each binding of the adapters and the processors and, at each binding of a processor/adapter pair, enumerates the connected mass storage devices with which the processor of the processor/adapter pair is to communicate and determines, for each such mass storage device, a mass storage identifier by which the processor identifies the mass storage device.
An address mapper is incorporated into the operating system device driver stack and references the binding mapper to construct and store an address map. The address map contains a processor set for each mass storage device wherein each processor set includes an address map entry for each processor in the system. Each processor set is indexed by processor number and contains the mass storage device identifier corresponding to the optimal path of access to the corresponding mass storage device.
Thereafter, the address mapper responds to each request for access to a mass storage device by a processor, wherein the request includes a processor name, by providing the corresponding address map entry from the processor set corresponding to the requesting processor. The processor then completes the access to the mass storage device by directing the request through its paired disk device adapter, as determined by the returned address map entry.