The subject application relates to failure detection in device networks. While the systems and methods described herein relate to distinguishing between hard and soft failures in device networks such as printing networks and the like, it will be appreciated that the described techniques may find application in other network systems, other failure detection applications, etc.
When using an automatic failure detection system in a print infrastructure, a probabilistic approach examining printer usage pattern changes cannot guarantee a perfect accuracy. An existing approach of monitoring devices based on sensor devices does not guarantee that all incidents will be detected correctly. One example is the occurrence of repetitive paper jams: the printer does not work properly, but the printer system declares that it is working properly once the jam has been removed. In contrast, the users know in this case that the printer will fail to print the next paper. Conventional systems cannot solicit or employ user's feedback, but rather rely on hardware to warn a network administrator about potential failures.
There are several possible scenarios that can be observed with regard to the output behavior of a conventional printing network. For instance, a large number of users who switch between a first device (e.g., a first printer) and a second device (e.g., a second printer) can thereby overload the second device and cause the network to report a failure condition at the second device. In another scenario, false alarms or fault conditions may be reported when a small number of users change their device usage due to a precise print resource need (e.g., color printing, large format, etc.), causing the network to report a false failure. In another scenario, the network operates normally and no failure is reported.
Conventional systems are subject to errors and imperfect decisions due to the generative nature of their algorithms, which make use of “priors” (e.g., historical usage information and/or patterns) that may not be applicable for all users and/or all situations. An example is the false positive alerts generated by individual usage changes due to a specific need for specific features that are provided by fewer than all devices in the network (e.g., binding, high quality photo printing, collating, etc.).
Conventional monitoring systems make use of device data coming from sensor information data aggregated by embedded rules coded in the device's firmware. The capabilities of such monitoring systems are basically limited by two aspects: available sensor data and its quality, and capability of the device embedded rules to explain failure states based on the limited sensor data. Current embedded device monitoring systems suffer from weaknesses in both aspects.
Typically, quality image problems are not detectable by the internal monitoring systems of the device due to the unavailability of image quality sensors (e.g., a camera) in the device's output. Adding new sensors to increase the monitoring capabilities of devices is only possible in high-end production devices, while in office devices where sales margins need to remain high while products stay competitive there is little possibility of adding new sensors.
Embedded diagnosis rule based systems are also limited not only by the data but also by the inherent limitations of rules systems. With rule-based systems, it is difficult to target complex failure patterns. For example, it is difficult to define the conditions of failures when complex temporal dependencies are involved. Writing rules with some degree of uncertainty or variability in the way the failure can be inferred from sensor data is in general difficult to express using simple rules. Nevertheless rule-based systems are today's standard commercial solution.
While the embedded diagnosis systems are slowly evolving, users are still suffering from device failures that are difficult to characterize, making devices unavailable and not always identified as unavailable from the device's sensor data. This results in users collectively switching from one device to another without having the user making a specific failure report to the IT administrator in most cases.
Accordingly, there is an unmet need for systems and/or methods that facilitate overcoming the aforementioned deficiencies.