Virtually any sophisticated software or hardware contains bugs. Many large-scale software or hardware systems have bug repositories containing rich information that may be leveraged to improve system quality. These repositories contain bug reports (sometimes referred to as tickets) comprising structured data as well as relatively unstructured data including free form text, (e.g., tester logs, automated messages, email and IM conversations etc.), images, audio and so forth. Some bug reports are very large in size.
As a result, much of the rich data in bug repositories are not used in a number of ways that may be highly useful. For example, with bugs, many questions related to bugs as a trend or as a whole (as opposed to dealing with individual bugs) go unanswered, such as whether there is a particular part of code/functionality (e.g., concurrency, security, hardware-software interface, external dependencies) that exhibits recurrent bugs, whether there are bugs or bug types that are difficult to diagnose and troubleshoot, and so on.
Other examples of the bug report data that are not leveraged include data that may help find early problem symptoms of not-yet discovered or reported bugs. Patches/fixes can themselves contain bugs, with unknown impacts, and there may be dependencies that exist when fixing bugs. Successive software or firmware versions may have some new parts therein that improve quality, others that remain the same in quality, and others that introduce more bugs. Moreover, there are bugs that do not cause crashes, but cause fast energy drainage or reduced performance, for instance, in the case of software or applications running on mobile devices.
Bug repositories typically contain a lot of useful and rich information. However, it is difficult to perform automated problem inference on this information. Attempts have been made to automatically use some of the fixed field data in the bug reports, but much of the rich information is in the unstructured data that is part of the report, which goes unused unless and until manually reviewed.