Computer Systems and Program Flow
As is well-known in the art, modern computers are normally controlled by computer programs, commonly referred to as "software." FIG. 1 shows a greatly simplified illustration of a hypothetical computer system 100 controlled by software.
The computer system 100 is a hypothetical one intended for illustrative purposes. Those of ordinary skill having the benefit of this disclosure will recognize that the computer system 100 is representative of numerous conventional computer systems such as, e.g., the HP 9000 system.
The software that controls the computer system 100 typically takes the form of one or more series of instructions (and data) executed by a processor system 110. The processor system 110 may comprise one or more processors commonly referred to as central processing units (CPU). A well-known example of a processor is the Intel Pentium microprocessor.
The instructions, often referred to as "code" (and related groups of which are often referred to as "routines") are stored into the internal memory system 120 of the computer system 100 The memory system 120 may comprise read-write random access memory (RAM) and/or read-only memory (ROM).
The processor system 110 initially loads the instructions into the memory system 120 by copying them from the storage system 130. It does so because in operation the memory system 120 is usually very much faster than the storage system 130; the memory system 120 thus serves as a kind of working scratch pad for the processor system 110.
The storage system 130 may take a variety of forms. In a typical computer the storage system may include floppy disks, a so-called hard disk, and perhaps a CD ROM. In mainframe computers the storage system may include a Direct Access Storage Device (DASD) system, which is a special kind of hard disk system.
At some point after loading the instructions into the memory system 120, the processor system 110 reads the instructions from the memory system and performs the operations erations specified in the instructions. (The instructions are sometimes read one by one but often in a batch for greater speed.) The instructions at locations 0001, 0002, etc. in the memory system 120 are commonly executed one after another.
The locations of the various instructions within the memory system 120 are commonly referred to as the "addresses" of the instructions. An instruction that is located at a given address is sometimes said to "reside" at that address.
In the example shown in FIG. 1, the processor system 110 performs a memory access operation to read an instruction from a specific address in the memory system 120 (e.g., address 0001). The processor system then performs the operation specified by that instruction (in this case, reading data from a disk). It then continues with the instruction in the next address (in this case, address 0002).
Branching Problems in Multiprocessor Computer Systems
Sometimes instructions are not executed entirely in sequence. In FIG. 1, the hypothetical instructions at addresses 0003 and 0004 form what is referred to as a "branch" or sometimes as a "jump." Instruction 0003 causes the processor system 110 to store the number 0007 in a register X. (In this context a "register" is a temporary data-storage area.) Instruction 0004 causes the processor to determine whether the day of the week is Tuesday and, if so, to jump to the address specified in register X--which in this case is instruction 0007.
The branch shown in the simplified illustration of FIG. 1 is a two-instruction operation. Instruction 0003 is a function entry point set-up instruction which calls for loading of data into a temporary storage area, i.e., register X, while instruction 0004 is the actual jump instruction. In many implementations, branching may involve even more set-up instructions.
Multiple-instruction branching can cause problems in a multiprocessor computer system. As the name suggests, in a multiprocessor computer system the processor system 110 has multiple processors, e.g., multiple CPUs. A simplified example is shown in FIGS. 1 and 2, where the processor system 110 has two processors 111 and 112.
In multiprocessor computer systems, the various processors typically fetch common program instructions from the memory system 120. An analogy would be to have multiple workers utilizing a single copy of an instruction manual for performing a complex task on a time-shared basis. That means that in a multiple-instruction branching operation, the instructions might have changed (e.g., through action by another processor) in between the time that a processor 111 or 112 did its initial set-up processing and the time it reached the actual jump instruction. This is especially true when, as is often the case, the processors use "pipelining" techniques to read ahead in the list of instructions stored in the memory system 120. As a result, when a processor actually executes the jump instruction, it might not have executed the newly-modified set-up instructions and thus might execute the jump instruction incorrectly.
Suppose that in the hypothetical example of FIG. 1, the processor 111 executes the set-up instruction 0003 and stores the value "0007" in its register X.Then suppose that before the processor 111 actually executes the jump instruction 0004, the set-up instruction is changed (e.g., by the processor 112) to specify a jump address other than 0007. That can cause serious problems for the computer system, even causing the computer system to crash.
Referring to FIG. 2, this problem is commonly handled in designing software by including in the instructions what are referred to as "lock" instructions such as that shown at address 0002. A lock instruction is a resource-protection facility that is provided by, e.g., the operating-system software of a computer system.
Somewhat analogous to a traffic light, the lock instruction 0002 permits one processor to seize control of the system and take action without fear of unintentional interference by other processors. The controlling processor can then later issue instructions to cause other processors to "synch up" (synchronize their operations) with the controlling processor (instruction 0008 in FIG. 2), followed by issuing a release-lock instruction to permit the other processors to resume their normal operations (instruction 0009).
Resulting Difficulties in "Hooking" Software Execution
The problems of multiple-instruction branching make it difficult to "hook" (seize control of) the execution of software instructions in a memory system 120. Hooking is a branching technique by which software can be edited or patched "on the fly" (normally by other software) for a variety of reasons, e.g., to improve the performance of the software. For example, the assignee of this application, BMC Software, distributes a number of software packages that use hooking techniques to change the operation of other software such as IBM's VTAM communications software.
Hooking is typically carried out by a processor that executes instructions causing one or more other instructions to be changed even while they are stored in the memory system 120. That has the effect of changing the behavior of the computer system when the edited instruction is executed.
Before-and-after examples of two different types of hooking are shown in FIGS 3A, 3B, 4A, and 4B. In FIG. 3B, the instruction at address 0003 is patched so that the value "1052," not "0007" as shown in FIG. 3A, is loaded into register X; that causes the next instruction 0004 to jump to the instruction at address 1052 instead of the instruction at address 0007. On the other hand, in FIG. 4B an instruction to jump to a completely new segment of code is patched in between instructions 0008 and 0009 of FIG. 4A.
Hooking may be accomplished in different ways; two techniques are described here as examples. The first technique is accomplished via hardware instructions that allow direct write access to physical storage such as the memory subsystem 120, bypassing the address translation hardware logic (known commonly as dynamic address translation or hardware address translation). In this method, the physical address of a location in the memory subsystem 120 is acquired through the use of instructions or existing function calls provided by the operating environment. Once the physical address has been acquired, a jump instruction is written to that location; the instruction caches of the processor 111, 112, etc., may be flushed to eliminate any residual information about that address in cache. As a result, the processors 111, 112, etc., fetch the jump instruction from the memory subsystem 120 on their next reference. The flush operation will cause all processors in the processor system 110 to flush their caches at substantially the same time. This may involve delaying continuing execution of the modified program code until such time as all caches have been flushed (this time delay is normally a guaranteed time within which all processors will have flushed their cache).
Another technique is to programmatically alter the information in the dynamic address translation, hardware address translation, or equivalent hardware translation tables to allow direct storage of the jump instruction. This may be required due to the fact that in many implementations, executable code segments are treated as read-only segments of memory. By altering the translation tables, the write is temporarily allowed and then the translation tables are set back to their initial state. Flushes of the caches, as described above, may be required also, but on many architectures the required synchronization of the caches is handled by the hardware in a multi-CPU environment when the cache is written into in order to insure that all CPUs see that `new` data in memory.
Hooking can be dangerous in a multiprocessor computer system because it may be necessary to edit multiple instructions as part of the hooking process. That gives rise to the possibility that one processor might execute an edited instruction, but that other processors might race ahead to execute unedited instructions, perhaps with disastrous results.
Locks can sometimes be used to address this problem as described above. Locks are not always available, however, and may also be inefficient because they can adversely affect the performance of at least a portion of the computer system.