In the program debugging method, a so-called boundary test that checks whether an access address or access timing is included in the effective range of a memory segment allocated with the address at that time is performed on a memory access when a computer program is executed, thereby detecting an invalid access beyond the range.
In a process of developing various kinds of computer programs for a PC (Personal Computer), a workstation, and an embedded apparatus, a test process of testing whether a created program is operated as required, without any problem or a debugging process of identifying causes of detected errors and performing necessary correction of the errors has been known. These processes are important to improve the reliability of an apparatus or a system that is operated by the program.
A boundary test of a memory access is one of the tests performed in the test process or the debugging process. The boundary test determines whether the reference address of a memory access obtained by the execution of the program indicates a valid memory area that is allocated for executing the program at that time.
In various kinds of programming languages, such as the C language and the C++ language, it is possible to use a pointer-type variable. The pointer variable can hold an arbitrary address value, regardless of whether valid data is arranged in a reference destination. Therefore, when a program including an erroneous execution command is described, a memory access is performed beyond an allowed range. In this case, serious programs are likely to arise in the apparatus or the system. The boundary test removes an invalid memory access beyond an unintended range from the program.
There are various kinds of boundary tests. One of them operates a program to be tested on an actual environment or a virtual execution environment, traces a memory access occurring at that time, and checks consistence with a memory allocation state. In this boundary test, for example, a processor or hardware having a function of monitoring a memory access signal, debugging software, and a simulator for simulating the operation of the processor are used to execute the program while monitoring the memory access. Alternatively, in some cases, an interpreter that directly analyzes execution commands described in the C language is used to execute the program.
Another boundary test transforms a program to be tested while analyzing it to generate another program that performs the original process to be performed by the program to be tested and a memory access boundary test, and executes the generated program in an appropriate environment. The program transformation type is a source code level test type that is performed on a program in which execution commands are described in the C language or the C++ language or a binary level type that transforms an executable program having debugging information added thereto while analyzing a machine language included in the program as the execution command.
An example of the former type is Non-patent Document 1 or Fail Safe C (Non-patent Document 2) (However, this is a technique whose main purpose is not a test, but which is for generating a safe program that is not erroneously operated during execution). An example of the latter type is Valgrind (Non-patent Document 3) or Purify (Non-patent Document 4). Patent Document 1 relates to a program transformation technique that is applied to the latter type.
In the Valgrind or Purify, a simulator with a function of monitoring various kinds of states or a test function executes an executable program to be tested and creates a program that virtually performs a process equivalent to the test process. In the virtual simulator, it is possible to store auxiliary information for test in each value stored in the memory, which is used during execution, or a register of a CPU (central processing unit). In Non-patent Document 2 (Valgrind), this information is called a shadow value.
The present invention provides a method of determining whether each memory access traced during a program test is a valid access matched with memory allocation or an invalid access, as one of the techniques used in various kinds of boundary tests for the memory access. Techniques related to the present invention will be described below from this viewpoint.
Before description, terms are defined. A program is a continuous memory area in which an access is allowed, and a unit giving the boundary of the determination of an invalid address in the boundary test is called a memory segment or a segment.
The segments include (1) a heap segment that is dynamically arranged on a heap memory and is dynamically deallocated, (2) a static segment whose arrangement is determined when execution codes of the program are generated or when the execution of the program starts, and (3) a stack segment that is arranged on a stack memory and is accessed by a relative address reference based on a stack pointer which is dynamically given when a function is called.
In the boundary test, the reference address of the memory access traced during the execution of the program is calculated intending the reference of one of the memory segments that can be used at that time, and it is determined that the access is valid only when the address is within the effective range of the intended segment. The intention of the segment reference is determined based on whether the address of an access destination is calculated using the beginning address allocated to the segment.
A program of this determination is a method of specifying an address used as a base for calculating the address, when the test is performed. The pointer variable includes only an address value, but does not include the address used as a base for calculating the address or information related to a segment that is desired to be referred to. When it is difficult to specify the address used as the base, it is difficult to know an offset value added to the address and it is difficult to determine whether the range is valid.
<Guard Zone Type>
A guard zone type is a boundary test method in which an access prohibited area, which is called a guard zone, is provided before and after each of the ensured segments, a program is executed while acquiring the trace of a memory access, and when an access to the guard zone occurs, the access is determined to be invalid.
The guard zone type is used by many program debugging tools. For example, a memcheck tool of the Valgrind or the Purify has a function of performing the boundary test based on the guard zone type.
An example of a boundary test process using the guard zone type will be described. FIG. 13 is a diagram illustrating a C language program to be tested. The program declares the use of two char-type arrangements in a 37th row and a 38th row, and in this test, two static segments seg1 and seg2 having two arrangements stored therein are units of the boundary test.
FIG. 14 shows an example of the arrangement of the two segments seg1 and seg2 in an address space in the guard zone type. In FIG. 14, the address increases in the right direction. The segments seg1 and seg2 are arranged in this order using an appropriate address as a base. Each of the segments seg1 and seg2 can store eight char-type variables. The guard zones for detecting an invalid access are arranged before the segment seg1, between the segment seg1 and the segment seg2, and after the segment seg2. The size of one guard zone is 1 byte.
When the C language program shown in FIG. 13 starts to be executed, main functions in 41st to 47th rows are executed at the beginning, and four functions among them are sequentially called. An execution command to call a function func1 and a function func2 in a 43rd row and a 44th row is an example of causing an invalid memory access to be detected by the boundary test.
As a pattern that is frequently generated by a process using a pointer, there is an access to a continuous memory area by a repeated process. The function func1 in the 4th to 11th rows in FIG. 13 is an example of the access to the continuous memory area, and a process of writing 0 to nine char-type integers that are continuously arranged at the address indicated by an argument p. However, in the call of the function func1 in the 43rd row, since a pointer to the segment seg1 that includes only eight elements, which is an error, is set as the argument, an invalid memory access occurs.
When this program is executed with the segments arranged as shown in FIG. 14, the function func1 acquires a pointer p indicating the beginning address of the static segment seg1 as an argument and writes 0 in nine continuous char variables. When a variable i is 8, an area referred to by an access by a pointer in the 9th row is set as the guard zone. Therefore, it is determined that an invalid memory access is detected at that time, and the execution of the program stops. As such, the guard zone type is effective in detecting the error of a program description when a continuous memory access is performed.
The guard zone type has a problem in that it is difficult to detect a discontinuous invalid memory access. For example, the call of the function func2 in a 44th row of the program shown in FIG. 13 is an example of an execution command causing a memory access that cannot be detected by the guard zone type, even though the access is invalid.
The function func2 shown in FIG. 13 sets 0 to a 13th element of the segment indicated by the pointer of the argument, and designates an inappropriate segment seg1 including only eight elements in the 44th row. When this is executed in the memory arrangement shown in FIG. 14, a process in a 16th row generates an invalid memory access beyond the range of the segment seg1. As a result, the segment seg2 is broken. However, since this access refers to a memory other than the guard zone, it is difficult to detect this error.
In the program debugging process, when it is difficult to instantaneously detect the destruction of data due to an access beyond the range, it takes a long time to specify the cause of the error. In the program test process, when an access beyond the range does not cause an error fortunately under the operation conditions during a test, the test is ineffective in detecting the problems of the program. This accidental factor includes the arrangement address of the segment or the order in which the segments are arranged.
For example, when the boundary test is performed on a program described in the C language, there is a possibility that the error will not be detected fortunately depending on the order in which the segments are arranged by a compile tool. As such, the guard zone type has a problem in that it probabilistically detects a continuous access beyond the range.
<FAT Pointer Type>
A FAT pointer (or a safe pointer) is an extension pointer that increases the amount of information in a pointer variable and includes identification information of a reference segment in addition to a reference address value. A method of defining the pointer variable using the FAT pointer and dynamically performs the boundary test of an address value before a memory access is performed during the execution of the program is called a FAT pointer type. For example, Non-patent Document 1 or Non-patent Document 2 disclose a technique which transforms a C language program that is described with a general pointer variable without the FAT pointer into a program that dynamically performs a test using the FAT pointer.
The FAT pointer type can detect an invalid discontinuous access, unlike the guard zone type. For example, in the call of the function func2 of the program shown in FIG. 13, since a FAT pointer p of the argument includes information indicating the segment seg1 used as a base, a valid test can be performed inside the called function. In this example, before a memory access in a 16th row is performed, it is tested whether a pointer (p+12) indicates beyond the valid range of the segment seg1. Since the address value is out of the end of the segment seg1, it is detected that the memory access is invalid.
Generally, the FAT pointer type tests the program described with a programming language, such as the C language, but cannot be applied to an executable program. However, Nethercote et al. have developed a test technique that performs the boundary test of the FAT pointer type on an executable program using a virtual execution environment with a shadow value included in, for example, a Valgrind test tool (Non-patent Document 5).
The technique disclosed in Non-patent Document 5 stores a flag indicating a non-pointer type or a pointer type and an identifier of a segment used as a base when the flag indicates the pointer type as shadow values in each value stored in the memory during the execution of the program and each register, and updates the shadow values while tracing the execution of the program. For example, when a function for ensuring the memory is called from the heap, a flag indicating a pointer and the address of a base segment are stored in the shadow value of a return value. When this variable is copied to a variable disposed at another address, type information is also copied. Even when a pointer operation of adding or subtracting an integer to or from the pointer is performed, the trace of appropriate type information is performed.
The technique disclosed in Non-patent Document 5 differs from the general FAT pointer type in that the type of the value stored in the memory (a pointer type or a non-pointer type) is determined from an executed machine language command, without depending on a language description. However, the technique disclosed Non-patent Document 5 can be analyzed to belong to the range of the FAT pointer type in that the base segment is associatively stored in each value and is traced.
The boundary test method based on the FAT pointer type has a problem in that it does not respond to subtraction between the pointers referring to different segments, four arithmetic operations or logical operations that are performed regarding the pointer variable as an address value, and the division of an address into the most significant bytes and the least significant bytes (Chapter 2.5 in Non-patent Document 5).
In the C language, such a description depends on a processing system, but is not a valid program that is operated in all of the C compilers or the execution environments. However, the description is allowed in many environments and performs a predetermined operation. Therefore, there are many programs including the description. In addition, there is a program in which an equivalent address arithmetic process is not included in a language description, but is included in a generated machine language command.
The call of a function func3 in a 45th row of the program shown in FIG. 13 is an example of the subtraction of a pointer. The function func3 writes an int integer 0 in a memory indicated by a pointer obtained by adding an offset diff to a pointer p. The 45th row sets the beginning address of the segment seg1 to p and passes the difference diff between the beginning address of the segment seg2 and the beginning address of the segment seg1.
Here, the difference between the segment seg2 and the segment seg1 is the difference between the pointers indicating different segments. Therefore, the difference is a description that is not allowed in the C language standard. However, many C language compilers are operated such that the segment seg2 the segment seg1 returns the difference between the addresses of the segment seg2 and the segment seg1 and p+diff calculated in the function func3 indicates the head of the segment seg2.
When the boundary test based on the FAT pointer is performed on it, a 22nd row is an operation of adding an integer value to the pointer p of the argument. Therefore, a pointer q is analyzed as a pointer that refers to the same segment seg1 as the pointer p. However, actually, since the pointer q indicates the head of the segment seg2 and refers to beyond the range of the segment seg1, an access in a 23rd row is determined to be invalid by the test during execution.
The call of a function func4 in a 46th row of the program shown in FIG. 13 is an example of the conversion of a type to an integer and a logical operation. The function func4 acquires the address of the memory accessed in 32nd and 33rd rows as an ‘unsigned int’-type integer, not a pointer variable. In a compiler environment in which the address value can be represented by an ‘unsigned int’ type, the type conversion between the pointer and the ‘unsigned int’ is allowed, which depends on a processing system.
The function func4 writes 0 in a char variable indicated by the address p (32nd row), performs an exclusive OR (XOR) operation three times to convert the variables p and q (29th to 31st rows), and outputs 1 to a char variable indicated by an address q (33rd row).
As such, it is difficult to simply apply the FAT pointer type to the program including the subtraction of pointers between different segments or the type conversion of the pointer into an integer. According to the program including only simple type conversion, it is possible to perform the test using the Fail Safe C or the method investigated by Nethercote et al. (the Fail Safe C is a FAT integer type that traces and stores the base segment in a converted integer variable, and the method investigated by Nethercote et al. traces the base segment without using a language description), but it is difficult to respond to an arbitrary operation.
For example, in a 30th row of the program shown in FIG. 13, it is substantially difficult to associate one segment with the logical operation result of an address value. Therefore, it is difficult to perform a valid boundary test after the 30th row. As such, the FAT pointer type in which base segment information is included in the pointer variable has a problem in that it restricts the description of a program that can be tested.    [Patent Document 1] U.S. Pat. No. 5,193,180    [Non-patent Document 1] Austin et al., Efficient Detection of All Pointer and Array Access Errors, Computer Sciences Dept., Univ. of Wisconsin-Madision, pp. 1-29, 1993.    [Non-patent Document 2] Yutaka Oiwa, Tatsurou Sekiguchi, Eijiro Sumii, Akinori Yonezawa, “Fail-Safe ANSI-C Compiler: An Approach to Making C Programs Secure,” In Lecture Notes in Computer Science Vol. 2609, pp. 337-342, 2003.    [Non-patent Document 3] N. Nethercote and J. Seward, “Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation,” Proceedings of PLDI 2007, San Diego, Calif., USA, June 2007.    [Non-patent Document 4] R. Hastings and B. Joyce, “Purify: Fast detection of memory leaks and access errors,” in Proceedings of the Winter USENIX Conference, pp. 125-136, 1992.    [Non-patent Document 5] N. Nethercote and J. Fitzhardinge, “Bounds-Checking Entire Programs without Recompiling,” In Informal Proceedings of SPACE 2004, 2004.