A watchpoint (also known as a hardware breakpoint) defines a contiguous range of addresses in computer memory that will trigger an interruption to program execution if any address within the range is referenced by the program. The interrupt so generated is known as a watchpoint interrupt, or alternatively as a debug interrupt or debug exception, since it is typically employed for debugging or monitoring purposes. A watchpoint interrupt causes current program execution to be suspended, and execution to continue in special code called a watchpoint (or debug) interrupt handler. Interrupts generated synchronously though execution of a machine instruction are frequently referred to as exceptions (the terms exception and synchronous interrupt are used interchangeably herein).
Watchpoint interrupts can be divided into three main classes, according to the type of access being made to the relevant address: execute, read or write. In the first case, known as an execution watchpoint, the watchpoint is triggered when the processor tries to execute the instruction stored in the relevant address. In the latter two cases, known as a storage watchpoint, the watchpoint is triggered when the processor tries to access data stored (or to be stored) in the relevant address.
Processors typically provide special registers and instructions for storage of watchpoint addresses and ranges. However, the number of available watchpoints is limited by the number of registers provided for this purpose. In one prior art architecture, known as IA32 (implemented by various 32-bit processors from Intel Corporation) there are debug registers to provide a watchpoint capability, and a maximum of four watchpoints can be defined, each with a corresponding address range of one, two or four bytes. This limitation can represent a significant constraint on the effectiveness of debugging operations using watchpoints. Another problem is that such restrictions on watchpoints tend to be processor specific, with little commonality across different architectures. For example, the IA64 architecture (implemented on 64-bit processors from Intel Corporation) has a significantly different watchpoint implementation from the IA32 architecture.
One particular parameter that tends to vary according to processor architecture is the timing of the watchpoint interrupt. Thus some processor architectures generate a watchpoint interrupt before the relevant instruction executes, whilst others generate it afterwards. The case of the IA32 architecture is more complex in that the timing of the interrupt depends on the nature of the instruction; execution watchpoints generate an interrupt before instruction execution, whereas storage access watchpoints occur after successful access to the relevant storage location.
The specific code that is invoked by the operating system as a result of a watchpoint interrupt is referred to herein as a watchpoint interrupt handler. The watchpoint interrupt handler usually invokes a debugging function. Frequently this function is not part of the core operating system, but rather some ancillary utility or program, such as a kernel debugger or application debugger.
Interrupt handlers are generally identified to the processor by having their starting addresses stored in memory locations reserved for this particular purpose. FIG. 1 illustrates the implementation in IA32, which uses an Interrupt Descriptor Table (IDT) 110 stored in system memory 100 to identify the interrupt handler appropriate for any particular interrupt. The IDT itself can in turn be located from the Interrupt Descriptor Table Register (IDTR) 102.
FIG. 1 also illustrates two particular interrupt handlers referenced from the IDT (it will be appreciated that typically there are many others). The first of these arises from a single-stepping mechanism supported by most processors. This enables debugging applications to allow one instruction in a debugged program to execute at a time, control being returned to the debugging application after each instruction is executed. The hardware implementation of this mechanism usually involves the generation of an interrupt to signal the completion of a single step. Under IA32 a specific IDT entry (shown as 111 in FIG. 1) is reserved for this purpose. This entry in turn references an interrupt handler 121 that is given control after a single-step interrupt, and is the known as the single-step interrupt handler.
The second interrupt handler in FIG. 1 is referenced from entry 112 in the IDT, and is a page fault interrupt handler 122. This handler is triggered by page fault exceptions generated in certain circumstances when storage is accessed, and is part of the system paging mechanism.
Thus as is well known, a paging mechanism is provided on many processors to enable programs to execute as if much more system memory were present than is really the case (up to the maximum provided by the address space, e.g. 4 Gbytes on a 32 bit system). The apparent memory seen by programs is known as virtual memory (or virtual storage), whereas the actual amount of memory installed is called physical memory or physical storage (on some architectures the term real is used instead of physical). The mismatch between virtual and physical memory is accommodated by using some form of additional external storage, typically a hard disk drive, as an overflow for physical storage.
The paging mechanism generally divides all of virtual memory into evenly sized segments known as pages; the page size is typically 4K bytes or 4M bytes, or some other value depending upon the processor architecture. Any given page of virtual memory is present either in system memory, or in the external overflow area, stored in the latter case in what is termed a swap or page file.
Programs cannot access data directly in the page file. Instead, the page manager component of the operating system has the task of copying of data saved in the page file into physical memory, from where it can be accessed by programs. This typically requires that other data in physical memory is first copied to the page file, in order to create some vacancy in physical memory.
The transfer of data between system and external memory (effectively between real and virtual memory) is carried out in units of pages, and hence is known as paging, or sometimes as swapping. This transfer is largely transparent to application programs, except that access to data in a page file will be much slower than access to system memory, to allow time for the necessary paging to be performed.
FIG. 2 illustrates the structures used for the management of virtual storage. A table (know as the page table) 220 is used to translate an address in virtual storage into an address in physical storage. A page table entry (PTE) 231, 232 is used to determine the location of the corresponding page 251, 252 in system memory 250.
In practice, the page table 220 is most commonly split into a hierarchy of tables, such as shown in FIG. 2. Thus the top level of the hierarchy, known as the page directory 210, contains entries that reference subtables in the next layer of the hierarchy. Each entry 211, 212 in the page directory specifies the range of pages detailed in its corresponding subtable 221, 222. Therefore, to locate a particular page, the page directory entries are first scanned to find the range that includes the desired page. This then allows the corresponding subtable to be searched, which in turn locates the PTE identifying the address of the desired page in system memory.
Also shown in FIG. 2 is a register 201. The processor uses this register (or some other reserved memory location) to locate the page table hierarchy. In the case of the IA32 architecture, there is a two-level page table hierarchy, and Control Register 3 (CR3) is used to locate the top of the hierarchy, i.e. the Page Directory 210.
It will be appreciated that while the hierarchy shown in FIG. 2 has two levels, other known systems have a different arrangement, depending on the total number of pages and other system parameters. For example, in some cases there may simply be one overall page table, in others the hierarchical structure may contain three or more levels. Since, the precise page table structure is not relevant to understanding the present invention, reference will simply be made in general terms herein to the page table and page table entries, irrespective of the precise implementation details.
For performance reasons, processors generally cache the most frequently referenced page table entries in an internal area, frequently referred to as the Translate Look-aside Buffer (TLB) (not shown in FIG. 2). If changes are made to page table entries by the operating system then the TLB will have to be resynchronised using purpose-built machine instructions. The IA32 architecture provides the Purge TLB instruction for this purpose; some other architectures allow specific TLB entries to be purged.
Although FIG. 2 only illustrates PTEs corresponding to pages that are located in system memory 250, it will be appreciated that there is in fact a PTE for each page, whether currently located in real or virtual memory. The PTE is therefore used to indicate to the processor whether or not the corresponding virtual storage page is mapped to a physical storage page, and if so, the identify of the physical page. This is reflected in the structure of a PTE shown schematically in FIG. 2A. The PTE 271 includes an index 271 to the location of the page, plus a set of status flags 275. One of these flags 276 is set according to whether or not the page is currently present in physical memory.
If the processor tries to access a page which is not in physical memory, but instead located in the page file, then a page fault interrupt or exception occurs. The current program execution is temporarily suspended and execution continues with the page fault handler. This is made known to the processor by the standard interrupt mechanism, which for the IA32 architecture is an appropriate entry in the IDT. Returning to FIG. 1, the page fault interrupt handler 122 is shown being referenced from IDT entry 112.
On receipt of an interrupt, the page fault handler 122 determines whether the interrupt occurred because of an erroneous storage access made by the program that was interrupted, or because the desired data is currently stored in the page file. In the latter case, the requested page is copied into an unused region of physical storage. The corresponding PTE is then updated, to indicate that the page is now present in system memory, and to identify the physical page where it is stored.
If there is no spare page in physical memory in which to store the incoming page, then space must first be made available. This is achieved by removing a page from physical memory, more particularly by copying its data to the page file, and then updating the page table entry accordingly. In addition, the page fault handler updates certain status information (not shown in FIG. 2) to indicate where in the page file to locate the removed page (or pages). The mechanism for this maintaining this status information is operating system specific, and need not concern us here.
Note that in order to handle the page fault, the address of the instruction causing the page fault and the accessed memory address for which the fault occurred are necessarily made available to the page fault handler 122 by the processor. The precise mechanism for passing this information is processor specific; in the case of IA32, the instruction address is pushed onto the stack used by the page fault handler, a memory area accessible from the Extended Stack Pointer Register (ESP), and the accessed memory address is presented in Control Register 2 (CR2).
The page table may be updated not only in response to a page fault, but also in certain other circumstances. Of particular relevance here is the situation for operating systems that provide a multi-tasking environment, where different tasks have task-specific views of virtual storage. This requires the operating system to maintain a (logically) separate copy of the page table for each task that has a task-specific view. Whenever a new task is run, the page manager (of which the page fault handler is a part) refreshes the processor's page tables so that it contains the data appropriate for the new task.
U.S. Pat. No. 5,611,043 describes a debugging system, which utilises the operating system paging mechanism to provide watchpoints, thereby overcoming limitations on the number of hardware breakpoints that can be supported. Thus a page for which a watchpoint is set is marked in the page table as “non-writable”, whilst the correct information from the page is stored separately. When an attempt is made by the application being debugged to access the page, a page fault interrupt is raised because the page is apparently non-writable, which in turn passes control to the debugger. After any desired analysis, the page is restored to its original format, and the application allowed to proceed with a single step. This time the instruction executes correctly, rather than triggering the page fault, because of the updated page information, but hands control back to the debugger when it has completed its single step. The watchpoint is then restored by again altering the page table, so that it will trip again for any future accesses to the page, and the application is allowed to continue.
Although the system described in U.S. Pat. No. 5,611,043 offers significant advantages over hardware breakpoints, it too has certain limitations. For example, it is only effective for operating systems that provide a debugging interface with access to the paging mechanism, and even for operating systems that do have such a facility, there is little commonality across the different implementations. It is also restricted to task-specific watchpoints (i.e. the application actually being debugged), and so can't handle access from kernel-space rather than the user-space of the application.