Conventionally, a data analyzing apparatus poses three problems [1] to [3] when analyzing big data. First problem [1] is the incapability of using an optimum combination of, an analytical parameter and input data format. Second problem [2] is that the accuracy decreases in accordance with a change in input data trend. Third problem [3] is the incapability of solving first and second problems [1] and [2] at the same time. Problems [1] to [3] will be described in detail below.
[1] It is beginning to find that when performing data analysis (e.g., machine learning), analysis using processed data (e.g., a weekly average) based on original data has a larger influence on the accuracy of analytical results than analysis using only original data as input data to be used.
Unfortunately, the number of types of preparable processed data can be semi-infinite especially for big data. In techniques described in, e.g., Jpn. Pat. Appln. KOKAI Publication No. 2012-14659 and Jpn. Pat. Appln. KOKAI Publication No. 2009-87190, therefore, it is impossible to use an optimum one of semi-infinite variations, so a variation obtainable by a clear-cut solution is used.
Similarly, the number of combinations (analytical methods) of analytical parameters to be used in analysis such as machine learning and variations of the above-described input data format can be semi-infinite.
Accordingly, it is impossible for Jpn. Pat. Appln. KOKAI Publication No. 2012-14659 and Jpn. Pat. Appln. KOKAI Publication No. 2009-87190 to use an optimum one of the semi-infinite analytical method combinations, so a variation obtainable by a clear-cut solution is used.
[2] When an input data trend has changed, the accuracy decreases if an extracted knowledge model is applied.
A knowledge model is extracted by performing machine learning by immediately preceding input data by using the same parameter as that used before. This makes it impossible to perform control when a new input attribute is beginning to have influence.
[3] It is difficult to automatically solve problems [1] and [2] at the same time.
For example, Jpn. Pat. Appln. KOKAI Publication No. 2012-14659 does not show the processing time of a recommendation generation process corresponding to a request from a user, and consideration on control of the load.
In addition, Jpn. Pat. Appln. KOKAI Publication No. 2012-14659 is a method which does not stock any analytical method based on the accuracy result in association with problems [1] and [2]. This makes it difficult to maintain the improvement.
Furthermore, Jpn. Pat. Appln. KOKAI Publication No. 2012-14659 only describes that “necessary data can be formed beforehand by using a log, and can also be formed at the timing of recommendation pattern determination” in an embodiment, and describes neither an implementation method nor a possible problem.
On the other hand, in the technique described in Jpn. Pat. Appln. KOKAI Publication No. 2009-87190, input data in problems [1] and [2] is limited to stream data, and no other input data format can be processed.
As explained above, the conventional data analyzing apparatus has three problems [1] to [3]. Accordingly, demands have arisen for the data analyzing apparatus to be able to use an optimum combination of an analytical parameter and input data format, and at the same time maintain the accuracy even when an input data trend has changed.
It is an object of the present invention to provide a data analyzing apparatus and program capable of using an optimum combination of an analytical parameter and input data format, and at the same time maintaining the accuracy even when an input data trend has changed.