Information technologies are increasingly prevalent in society, and produce a flood of data and information. A growing challenge is to make the knowledge contained in these data usable for different applications.
Data mining (e.g., extracting something valuable from a mountain of data) refers to the systematic application of primarily statistical/mathematical methods to a database with the aim of recognizing new patterns. Data mining may involve the processing of very large databases (e.g., databases that may not be manually processed) using efficient methods having a time complexity suitable for large volumes of data. However, the methods may also be applied to smaller volumes of data. In practice, the phrase “data mining” is used in the “knowledge discovery in databases” (KDD) process. The KDD process also includes preprocessing (e.g., http://de.wikipedia.org/wiki/Data-Mining).
In practice, data mining may raise a false expectation that interesting knowledge will be automatically extracted (e.g., without a substantial contribution from the user) via an approach known as “unsupervised machine learning”
In recent decades, a plurality of algorithms have been developed that may extract interesting sub-aspects from large volumes of data. However, the interesting knowledge that may be automatically extracted may correspond to relatively simple aspects in the data (e.g., frequent patterns, specific clusters and structures that are searched for and in some cases found). The user is responsible for the interpretation and evaluation of the quality of the algorithmically extracted knowledge.
Furthermore, an interaction with the user may be needed. For example, an algorithm for anomaly recognition may be based on an advance definition of normal behavior or the provision of normal data by the user in an approach referred to as “supervised machine learning” or “active learning.” The more complex that the demands made on a data mining system are, the more elaborate is the design of the interaction with the user.
One problem is being able to provide a suitable facility for communication between the user and the machine (e.g., in the form of a man-machine interface). The reason is that a discrepancy may exist between the machine-extracted information and the knowledge usable for the person. For example, model parameters may be influenced interactively in order to successively maximize the proportion of usable knowledge. Large volumes of data with complex correlations may pose considerable challenges to system performance.
Visual analytics (VA) is an interdisciplinary approach that combines different research fields. The aim of the VA method is to acquire knowledge from large and complex datasets. The approach combines the strengths of automatic data analysis with human capabilities for visually recognizing patterns or trends quickly. Data may be visually explored and knowledge acquired through suitable interaction mechanisms (e.g., http://de.wikipedia.org/wiki/Visual_Analytics).
The interaction on the graphical representation of conventional VA systems involves selecting interesting patterns that are already present in the data. However, the user is restricted to already existing patterns and is not allowed further flexibility.