This invention relates to the validation of a variable data item in a software routine in execution, and more particularly, to the generation of a profile rule for the variable data item which can be used to identify situations where a value of the variable data item is not a typical value.
Errors in a software application can be caused by the storage of inappropriate values in data items within the application. For example, a data item which is defined to store a numerical indicator for a calendar month can typically be expected to include values in the range of one to twelve inclusive, corresponding to the months January to December. A value of thirteen in such a data item may cause subsequent errors in the software application. An inappropriate value assigned to a data item in an application may not initially be identified as inappropriate, and only when the value of the data item is subsequently read or used by the application may errors occur.
Where a data item is intended to store values which can be anticipated at application development time, validity checks can be inserted into the application by programmers to ensure the data item is assigned valid values during execution. However, where the intended values of a data item cannot be anticipated at the time of application development, such validation checks cannot be used. For example, a memory pointer in an application contains an address to a location in a memory of a computer system. Typically, memory pointers are expressed in hexadecimal or binary notation and are provided by an operating system of the computer system when a unit of memory is allocated. The value of a memory pointer in an application is usually determined at runtime, and depends on the configuration of aspects of the computer system including, among other things: the implementation of memory in the computer system; the operating system of the computer system; and the architecture of the computer system. Thus, valid values of the memory pointer cannot be foreseen at the time of application development. It is therefore not usually possible for a programmer to validate a value of a memory pointer data item in a computer system.
While the value of a memory pointer cannot be foreseen at the time of application development, memory pointers will usually exhibit common characteristics. One cause of common characteristics in memory pointers can arise due to the way memory is allocated in a computer system, and is described below. FIG. 1a is a schematic diagram illustrating an arrangement of a memory in a computer system in the prior art. Memory 152 comprises multiple memory locations 154, each including a byte of storage 156 and a location address 158. Each byte of storage 156 is eight binary digits (bits) in length, and the memory locations 154 are therefore known as eight-bit memory locations. The location address 158 for each of the memory locations 154 is a reference to the memory location 154 in the memory 152. Location addresses 158 are numbered sequentially using binary notation. The memory 152 can be accessed by a software application using an operating system and a central processing unit (CPU) (all not shown). At each memory access, a fixed quantity of data can be read from, or written to, the memory 152 by the CPU. This quantity of data is known as a “word”, and the size of a word may vary with different CPU configurations. For example, an Intel® Pentium® 3 microprocessor (Intel and Pentium are registered trademarks of Intel Corporation) has a word size of thirty-two bits. In contrast, an Intel® 80286 microprocessor has a word size of sixteen bits. In FIG. 1a, the multiple memory locations 154 are divided into words 160 in accordance with a CPU word size of sixteen bits. Thus each of the words 160 in FIG. 1a corresponds to two contiguous eight-bit memory locations 154. Each of the words 160 includes a word address 162 and data 164. The word address 162 of each of the words 160 is the location address 158 of a first of the two memory locations 154 in the word 160. The data 164 of each of the words 160 comprises the two bytes of storage 156 in both the first and a second of memory locations 154 in the word 160. Because the CPU only accesses the memory 152 a word 160 at a time, only the word addresses 162 is used by the CPU to access memory 152. Similarly, the operating system and software application use only the word addresses 162 to access the memory 152. This is known as a “word aligned” memory model because all memory locations 154 are accessed as words 160.
It is the word alignment of data in a memory 152 which can give rise to common characteristics of memory pointers in a software application. In a sixteen bit memory configuration (as illustrated in FIG. 1a), each of the words 160 has a word address 162 equal to the location address 158 of the first of the memory locations 154 in the word 160. Thus, the location address 158 of the second of the memory locations 154 in each of the words 160 is never referenced directly. An application uses only word addresses 162 to reference memory 152 and so all memory pointers in the application will point to a memory location 154 which is the first memory location 154 in a word 160. No memory pointers will point to a memory location 154 which is the second memory location 154 in a word 160. In a sixteen bit memory configuration this results in all memory pointers being a multiple of two, and thus the least significant bit of all memory pointers will be zero. Similarly, in a thirty-two bit memory configuration, all memory pointers will be a multiple of four, and thus the least significant two bits of all memory pointers will be zero. Thus, in a word aligned configuration of a memory, memory pointers in an application exhibit common characteristics.