1. Technical Field
This disclosure is directed to methods for preprocessing value series data, which encompasses time series data, for selecting an appropriate analysis method and tuning parameters.
2. Discussion of Related Art
Choosing the right analysis method and tuning its parameters appropriately is a prerequisite for making useful analytics applications. This is especially true for the analysis of time or value series. The tuning and selecting of the right analysis method on the one hand requires statistical expertise to understand the methods and their tuning process while on the other hand requires domain expertise to interpret the data and understand the task of interest. The statistical analysis is frequently difficult to understand and use for the domain expert while statisticians waste time acquiring the necessary domain expertise for solving the task of interest.
A typical example is the denoising of time series derived from sensor data. Such series can exhibit anything from random noise added to the actual signal to extreme values or complete sensor failure.
There are many methods known for filtering noise and removing outliers from data. Simple examples are smoothing algorithms based on moving averages, spline based methods, or filtering techniques such as low pass filters, etc.
There are challenges with these methods.                Setting the parameters is a non-trivial task that usually requires a considerable amount of background knowledge, e.g., about the properties of the sensors. The choice of the best denoising method among a large number of diverse and highly tunable methods requires statistical expertise.        The “right” filtering parameters may change over time, possible even frequently. A sensor could, for example, exhibit different properties by day and by night.        The search space can be huge, which creates challenges from a point of view of computation complexity and statistical significance.        
For these reasons, pre-processing large amounts of time series for analytics is still a very work intensive task that requires profound statistical knowledge about the properties of filters and the distribution of the original data.
Instead of this, an improved method would:                1. be simple enough to be used by an expert without too much statistical knowledge;        2. reduce the amount of interaction to a minimum; and        3. allow for a fine grained application of methods to a single or a set of series.        
The current state of the art is to do this by a trial and error approach with the expert testing different methods and parameters to tune these methods to find the most suitable. This approach may, however, require much manual work and is prone to errors.
One alternative, if given a supervised learning task, is to use a wrapper with evolutionary computing to optimize the parameters for this task. As the search space for this optimization can be huge, these methods are likely to over-fit and have a high computational complexity. In addition, these methods are only applicable for supervised tasks. There are also methods of semi-supervised learning for clustering, which usually take pairs of entities and label them as similar or dissimilar. Based on this, optimal parameters and a distance metric can be learned. While these methods might work well for some data sets, they usually require many labeled pairs, and rely on good existing features, which are usually not available for value series. Furthermore, those methods are usually tuned for clustering and are not appropriate for analyzing value series. Most importantly, the interaction with the user is limited to labels given by the user, which restricts the interaction between the user and the analysis system.