Computer Operating System
Broadly, a computer operating system controls the activities of a computer. An operating system can be viewed as a very large computer software program that controls resource allocation scheduling, data management and input/output control. One aspect of an operating system is that it provides localized support for software/hardware integration so that individual applications need not be concerned with controlling hardware. One such operating system is HP-UX, an operating system which is manufactured by and made available from Hewlett-Packard Company, California, USA.
The structure of HP-UX can be broken down into two levels. At the core of HP-UX is the kernel. The kernel controls the hardware, schedules tasks and manages data storage. The kernel is "on" (i.e., running) at all times the computer system is on. The kernel is typically shielded from the end-user.
The next level of the HP-UX operating system is known as the user level. Located within this user level is a shell which acts as an interface between the end-user and the kernel. Typically, the shell interprets commands entered by an end-user and translates these commands into a machine language which the kernel understands. The primary task of the shell is to launch other user programs (i.e., call and execute programs from memory). Tools and applications are also located in this second level of the HP-UX operating system. Examples of applications include word processors, data management programs, computer graphic packages and financial spreadsheets.
One of the features of HP-UX is its ability to interact with hardware and hardware interface systems. Typical computer systems comprise a processing unit, memory units, a system bus, and a number of input/output (I/O) interface units. The I/O interfaces control data communications between the processing unit and the outside world, including the user and various peripheral devices.
The kernel of HP-UX contains kernel device drivers which, when configured, permit communication with and control of hardware devices, I/O cards located in the backplane of the computer, and other device drivers (e.g., network layers). Included among these device drivers are drivers for an HP-IB disk controller, magnetic tape drives, mass storage devices, and SCSI direct access storage devices. Typically, the structure of a device driver corresponds to the structure of the I/O hardware the driver is controlling. For example, a small computer system interface (SCSI) disk and tape device connected to a SCSI interface card installed on an EISA computer bus requires the following drivers and/or modules: general I/O services; EISA I/O services; general SCSI services; EISA SCSI interface card driver; SCSI disk driver; and a SCSI tape driver.
Analyzing Hardware Integration Problems
The complexity of the HP-UX operating system, particularly that portion which supports software/hardware integration, increases the probability of integration errors. This is compounded when the inherently peculiar problems associated with controlling the behavior of hardware are considered. One of the primary tasks of the kernel integration engineer is to analyze computer system operation to determine whether the operating system is performing as desired, and if an error is encountered, conduct a debugging process in order to correlate source code execution with the particular error. In the case of device drivers, the process of discovering and diagnosing problems is difficult because drivers are designed to "hide" the problems (e.g., auto-retry functionality). When a problem occurs in a driver, an important first step is to determine the software control flow and to correlate hardware behavior with software behavior, or hardware state to software state.
The primary tools used by kernel integration engineers in debugging integration problems are hardware logic analyzers, software debuggers and rudimentary event logging. Hardware logic analyzers provide the capability to monitor logical activity at the external terminals of a digital system (i.e., I/O busses). The logic analyzer is an oscilloscope-type instrument having dozens of channels, large memory and the ability to disassemble/decode bus states and phases. Typically, a logic analyzer is electrically coupled to a particular hardware interface and, while stepping through a process, the analyzer captures data. One problem with present hardware logic analyzers is that they capture data at a very low level of detail. Thus, it is often difficult to collect a sufficient amount of this low- level-detailed analyzer data to be correlated with source code execution to pinpoint root cause analysis information. This is particularly true for complex input/output protocols such as SCSI. Another disadvantage of the hardware logic analyzer is the high cost of the analyzer.
Software debuggers are programs which aid the kernel integration engineer by providing breakpoints in code execution and dump routines. Typically, software debuggers disassemble, decode and/or interpret code. They also allow variables to be examined so that the values of these variables and the like may be checked against expected values. While debuggers provide a good correlation between analysis data and source code, they are extremely invasive, resulting in decreased real-time execution timing and speed. Thus, hardware integration problems often will not manifest themselves while the code is being debugged.
A third tool available to kernel integration engineers is simple event logging. This process involves adding explicit event log points to the source code in order to store useful software state and path flow information. During code execution, event data is accumulated in an event log which holds data for the last N events. One problem with this scheme is that the appropriate size of the event log is difficult to determine. Typically, the log is a circular buffer; hence as the N+1 event is logged, the first event is pushed out. Increasing the size of the event log consumes memory. Decreasing the event log may result in critical data being pushed out of the log. Another problem with simple event logging is determining the number of log points. Selecting too many log points is invasive and will slow down the software. This also leads to the possibility of pushing important data out of the log. Selecting too few log points may result in missing critical dam. Multi-tasking systems with re-entrant drivers introduce another problem: events resulting from control of different hardware in multi-tasking systems with re-entrant drivers will often push out critical data previously logged.