Backup and recovery software products are crucial for enterprise level network clients. Customers rely on backup systems to efficiently back up and recover data in the event of user error, data loss, system outages, hardware failure, or other catastrophic events to allow business applications to remain in service or quickly come back up to service after a failure condition or an outage. Advanced network storage systems, such as those that utilize virtualization technology has led to the increased use of virtual machines as data storage targets. Virtual machine (VM) disaster recovery systems using hypervisor platforms, such as vSphere from VMware or Hyper-V from Microsoft, among others, have been developed to provide recovery from multiple disaster scenarios including total site loss. The immense amount of data involved in large-scale (e.g., municipal, enterprise, etc.) level backup applications and the number of different potential problems that exist means that backup performance and reliable operation is a critical concern for system administrators.
Software reliability is usually defined as the probability of failure-free operation for a specified time in a specified environment for a specific purpose. There is a general requirement for more reliable systems in all application domains. To achieve the level of reliability the proper method and techniques are necessary in any product development life cycle. Present methods of testing software generally rely on comparing software performance against known performance metrics. This may provide a measure of how well the software performs on a certain machine, but does not give a picture of how reliable the software is with respect to deployment over a period of time. In order to measure software reliability, system administrators often mine service records, trouble logs, or user feedback to derive a profile of reliability for software products. This method only provides a retroactive view of product reliability and does not provide an indication of whether a particular product is unreliable or operating sub-optimally while it is in use.
What is needed, therefore, is a method of measuring the reliability of a system through techniques that continuously monitor system performance and failure patterns caused by any component in the system, and by providing proper analytical metrics which can determine the reliability of the individual components and of the whole system as well.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Networker, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of EMC Corporation.