Machine learning systems may utilize highly complex analysis algorithms to generate statistically valid document recommendations relative to training documents of a training corpus. While these recommendations are statistically valid, end users may prefer example-based explanations when trying to understand why a certain document was recommended by the machine learning system.
Some machine learning systems attempt to avoid non-intuitive algorithms altogether when human-understandability of results is of particular importance. This approach may be acceptable in domains where classification accuracy is not of highest importance and, typically, where the number of dimensions that are used to describe the problem is low. Unfortunately, a low dimensional space rarely, if ever, occurs in document classification processes.
Another approach is to use feature reduction algorithms to either reduce the input space or the complexity of the solution. Further approaches may involve interactive visualization methods that put the burden of finding the most suitable explanation (e.g., most relevant match) on the user. Additionally, these interactive visualization methods often require significant computing resources in addition to excessive or undesirable end user effort.
The success of feature reduction algorithms is highly domain dependent. They are most suitable in domains where input dimensions differ in quality, meaning, and/or where dimensions are redundant. Irrelevant and redundant dimensions (e.g., words) are typically filtered out via stop word lists or combined by phrase detection algorithms in document classification domains so that a further reduction of dimensions often leads to a significant deterioration in classification accuracy.
End users are typically not interested in algorithmic explanations, whether those explanations are simple or complex. The end user desires to inspect specific training examples that are most likely the cause for the given classification of a new document.