When a person buys a car, they have an expectation that not only will the car perform reliably as advertised, but also that they will not be required to constantly bring their car into the shop for proactive maintenance in order to ensure the advertised high level of reliability. The loss of time caused by these frequent trips to the shop impacts the customer perception of reliability. They would have preferred to use that time on more desirable activities, such as a road trip.
Similarly, for customers purchasing computers, there is an expectation of not needing to constantly add patches and installations that require the rebooting of their system. The interruption and time from their intended activities, such as completing a word document or playing a game detracts from their experience. The time it takes to shut the system down and bring it back up lowers the customer's perceived reliability of their system.
The key in understanding software reliability and being able to objectively measure the reliability and availability of a customer's software system is the ability to define, detect, and isolate user disruptions. The ability to clearly define and identify these disruptions at the application, process, service, driver and OS level brings software creators closer to being able to better understand customer disruption and dissatisfaction; and hence, better understand their perception of system reliability and availability.
Once these user disruptions are programmatically defined and identified, a time/state model can be used to partition the downtime into its disruptive and non-disruptive parts. From these partitions, reliability metrics can be calculated based on disruptive downtimes, better isolating the reliability issues of a customer's system.