Computer programs are based on algorithms and data structures. Data structures are used to hold and organize information that is manipulated by algorithms to accomplish a particular result. The nature of the data manipulated can in many ways change the behavior of algorithms that processes such data. Algorithms usually define or imply a set of valid data and often protect themselves from invalid data. Thus, it is possible to imply that each algorithm can work with a pre-defined data domain. For instance, algorithms that perform arithmetic calculations usually cannot manipulate strings of characters as the expected results are not defined for that data type. In some cases, the very nature of the data can cause a system malfunction as, for instance, when an algorithm tries to divide a number by zero. Such malfunctions can be as severe as causing a computer to restart or worse—causing data loss.
While testing software, it is generally not surprising that there is great interest in defining what meaningful data for a particular algorithm is. Not only valid data the algorithm expects and can be used to verify correctness, but also invalid data that may cause system malfunction—something programmers often struggle to detect and correct before customers are affected. Often the very nature of the data the algorithm expects defines which values are interesting and which are not. For example, computers are limited machines with finite storage capacity and data that borders the limit of what a computer can handle and thus, is of interest for testing. For instance, assume a computer can only handle integer numbers up to 10. If an algorithm tries to add 10+10 an error will occur, thus 10 is an interesting or relevant number for testing an algorithm that adds two numbers. Another example is that computers cannot represent floating point numbers such as 0.1 exactly—they represent approximations of such numbers. Therefore, making calculations with such numbers is interesting to find rounding errors in some example application such as payroll or tax calculation programs.
Each type of data a computer can manipulate along with processing capacity of the computer itself may imply both a valid data domain and an invalid data domain. Such domains may be continuous or discrete and not necessarily easy to define. Even seemingly innocuous data types such as strings of characters—used to store sentences—can cause system malfunction, data loss or even cause security and privacy risks. It takes time and dedication to understand and define a set of data values that correctly and thoroughly exercise a particular algorithm. One may store sample data acquired over years of testing and use the same data set for the same class of computer programs. Unfortunately storing, organizing and adapting such data to each program is hard and error-prone as the input each program expects vary and the data often requires manipulation before being reused. It is also hard to inventory and categorize such data so prospective testers can verify completeness of their data domain coverage.