Manageability is a key requirement for a broad spectrum of information technology (IT) systems ranging from laptops to blade servers to clusters to large scale data centers. With rising complexity and scale in tomorrow's enterprise IT, systems management has become a dominating cost. As referred herein, manageability includes management and maintenance tasks or operations that deal with bringing up, maintaining, tuning, and retiring a system. Also referred herein, and as understood in the art, information technology, or IT, encompasses all forms of technology, including but not limited to the design, development, installation, and implementation of hardware and software information or computing systems and software tasks, used to create, store, exchange and utilize information in its various forms including but not limited to business data, conversations, still images, motion pictures and multimedia presentations technology and with the design, development, installation, and implementation of information systems and tasks. Thus, examples of IT management and maintenance tasks or operations include diagnostics and recovery, security protection, backups, resource provisioning, and asset management of IT systems.
At a broader level, the scope of IT manageability may be associated with the lifecycle phases for servers and data centers, including bring up, operation, failures/changes, and retire/shutdown phases. Various manageability tasks are performed at each of these life cycle stages. Examples include provisioning and installation of servers, monitoring performance and health of systems, security protection against viruses and spyware, backup protection against disasters, disk maintenance to improve performance, fault diagnostics and recovery, and asset management to track resources. Particularly, there is a class of manageability tasks that routinely runs during the operation lifecycle phase of an IT system and requires extensive access to the storage space in the IT system. These manageability tasks are storage-centric because they require constant access to the storage space for a period of time (or all the time). Examples of storage-centric manageability tasks include virus scanning, disk backups, disk (memory) integrity checking, and system fault diagnosis. The storage-centric manageability tasks have several common characteristics. First, most of these tasks are predominantly “read-only” and often process large amounts of data in the storage space (thus, requiring constant access to the storage space for a period of time) to provide a summary status report (e.g., virus scanning, disk auditing). Second, in most cases, the storage-centric manageability tasks run as background processes and are fairly insensitive to changes in their execution times as long as they make reasonable forward progress.
Traditionally, storage-centric manageability tasks have been executed on the host processor, sharing hardware and software resources with host system tasks in an IT system. This sharing leads to resource interference and hence degradation in performance. FIG. 1 illustrates such a traditional system architecture 100, wherein there is provided a host processor, such as a central processing unit (CPU) 101 (or any computer processor from, for example, Intel, AMD, and Cyrix), for executing both manageability tasks and host system tasks. The CPU 101 is connected to a memory controller hub, such as one typically implemented by a northbridge 102, via a system bus 130 or point-to-point links (not shown). As known in the art, the northbridge 102 is a chipset that generally integrates a number of high bandwidth input/output (I/O) buses that are typical for high-performance graphics, cache memory, or network adapter such as a peripheral component interconnect (PCIe or PCI express) video graphics card 104 and a memory device 107 such as a computer random access memory (RAM) module. The northbridge 102 also includes a connection, such as a PCIe connection 140, to an I/O controller hub, such as one typically implemented by a southbridge 103. As known in the art, the southbridge 103 generally integrates several I/O devices such as network interface cards (NICs), disk controllers for one or more storage devices, audio (not shown), an interface bus for device connection, and other devices in lower performance expansion slots through an I/O bus separate from the system bus 130. An example of a NIC includes an Ethernet land area network (LAN) card 105. Examples of a disk controller include a serial advanced technology attachment (SATA) controller 108 for a SATA disk drive, an integrated drive electronics (IDE) controller 109 for an IDE disk drive, and a smaller computer system interface (SCSI) controller 110 for a SCSI disk drive. An example of an audio device includes an audio or sound card. Examples of an interface bus include a universal serial bus (USB) for connection of an USB device thereto and a firewire or IEEE 1394 bus for connection of a firewire device thereto. An example of a lower performance expansion slot includes a PCI slot for connection of a PCI card or device.
Additionally, some storage-centric manageability tasks introduce additional constraints to a system architecture that go beyond performance. For example, given the critical nature of many of the storage-centric manageability tasks, system administrators typically desire a strong control over the execution of these tasks and do not want system users to disable or change configurations in ways that may undermine protection levels. Also, some storage-centric manageability tasks are routine tasks that may need to run periodically, sometimes even when the system is not being used or powered off. Thus, such tasks require more sophisticated optimizations for power efficiency. Additionally, certain storage-centric management tasks such as virus scanning may benefit from higher levels of privilege and isolation from the host tasks in order to better enforce security.
To address at least some of the above concerns, there exist a number of enhanced system architectures, as illustrated in FIGS. 2 and 3, that may be employed to separately run host and manageability tasks. FIG. 2 illustrates an enhanced system architecture 200, which is identical to the traditional system architecture 100, except for the use of a dual-core processor or CPU having a first core 212 and a second core 213. In the enhanced system architecture 200, the manageability tasks may be executed by the second core 213 while the host tasks may be executed by the first core 212. FIG. 3 illustrates another enhanced system architecture 300, which is identical to the traditional system architecture 100, except for the use of multiple distinct processors such as CPUs 312 and 313 instead of just one processor such as the CPU 101 in FIG. 1. In the enhanced system architecture 300, the manageability tasks may be executed only by the second CPU 313 while the host tasks may be executed by both the first CPU 312 and the second CPU 313.
However, the software model for both the enhanced system architectures 200 and 300 is similar to the software model for the traditional architecture 100. FIG. 4 illustrates such a software model 400 with a single operating system (OS) image 403 across a single-core processor, multiple cores of a single processor, and multiple processors of a host system 410 and a coherent view of the caches and memory. From an I/O perspective, the manageability tasks 401 and the traditional host tasks 402 both submit requests for I/O access to the same OS image 403, which direct I/O drivers 404 to access the desired I/O devices, such as the hard disk memory 406 via the I/O controller (for example, southbridge) 405 in the data storage area or subsystem 420. The buffer caches and other scheduling decisions are all handled by a single OS instance, namely, the OS image 403. If concurrent address to disk blocks is made, the OS 403 serializes them and submits the I/O requests in an order determined by an I/O scheduler (not shown). Thus, as in the traditional system architecture 100, the enhanced system architectures 200 and 300 continue to share some of the resources such as hardware caches, memory bus, and OS buffer caches, which continue to exhibit resource interference and hence degradation in performance. Additionally, the enhanced system architectures 200 and 300 may have power inefficiencies from having an entire core or processor powered on for manageability tasks.