The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware components (such as semiconductors, integrated circuits, programmable logic devices, programmable gate arrays, power supplies, electronic card assemblies, sheet metal, cables, and connectors) and software, also known as computer programs.
Computer programs often perform units of work, typically called transactions. Transactions may be performed by the computer program, or execution of the transactions may be distributed across multiple computer programs or across multiple computer systems. Further, transactions may request data from a variety of data sources, such as files, data structures, or databases, either on the same computer system or distributed across other computer systems. Transactions may fail, or not run to completion successfully, for a variety of reasons. For example, data may be temporarily unavailable or locked for use by another program, a computer system or network may be slow or unavailable due to a high load, or an error may occur.
Some failures may be expected while others are unexpected, success or failure may be subjective, and a condition that one program consider success another program may consider a failure. For example, a utility that opens files may report two different conditions: that the file was found and opened, or the file was not found. Both conditions are not be considered failures to the open utility because it performed its job correctly. Similarly, the program that invokes the open utility may simply create the file if it does not exist, so the invoking program also does not consider the file-not-found condition to be a failure. But, another program that invokes the open utility might interpret the file-not-found condition to mean that important data has been lost, so the transaction cannot continue.
Because of the importance of the success or failure of transactions, users would like to understand the reliability of transactions. The reliability of a transaction, or of a program that executes multiple transactions over a time interval (t), is the probability that the transaction or program can run without a failure over that time interval. Reliability is distinguished from availability, which is the probability that the program or system is correctly functioning at a particular point in time. Availability is usually expressed as a percentage of uptime.
The reliability of a program can change. For example, programmers often make changes to programs, creating different versions of the programs. The different versions may vary in reliability, and although programmers may have an intuitive feel for how the reliability has changed, they lack the ability to quantify the reliability.
Thus, without a better way to quantify reliability, users will continue to experience difficulty assessing the reliability of their computer programs.