1. Field of the Invention
The present invention relates to a technology for extracting a collection of related data from randomly accumulated data.
2. Description of the Related Art
Extensive and sundry information is today accumulated on computers. Extensive information alone, however, is incomprehensible to human beings. For this reason, data mining and multi-variable analysis receive much attention. The principal purpose of these types of technologies is two-fold:
(1) to extract structures from information and use these structures in, illustratively, estimates; and
(2) to compress data to a size amenable to human comprehension, and to make that data visual.
As data accumulation technologies have become less expensive, the increase in blindly accumulated data has brought about qualitative changes in the data. In other words, in the past it was possible to estimate in advance the small number of causal relationships existing in the data sampling performed in respect of data predisposed to a particular purpose. In new data types, however, there are scattered a plurality of causal relationships that have not been anticipated.
Even where an archetypal multi-variable analysis method, such as factor analysis, is used simply to analyze data wherein multiple causal relationships are scattered, it is difficult to obtain valid results. Human beings utilize knowledge relating to the features of the area in which data is accumulated and forecast the types of relationships subsisting in the data. It is thus necessary to divide the problems in advance. This type of task is quite costly and is, therefore, to the extent possible, delegated to computers.
To date, research respecting a technology for extracting or selecting features has been undertaken in the following fields: multi-variable analysis, pattern recognition, neural networks, and case-based reasoning. The term xe2x80x9cfeaturexe2x80x9d, as herein used, is defined as follows. By way of illustration, assume that height measurements are taken with respect to a plurality of persons. A height measurement or a weight measurement or, alternatively, age or gender information, for one of these persons is a quantity that indicates a feature of that person. Although characterized as a xe2x80x9cquantityxe2x80x9d, gender, for example, is responsive to only two classifications, namely, xe2x80x9cmalexe2x80x9d and xe2x80x9cfemalexe2x80x9d. It may therefore seem awkward to use the term xe2x80x9cquantityxe2x80x9d here. However, because it is one factor used to characterize a person, the term is used even in this instance. Furthermore, one record, which is the result of measurements of feature quantities with respect to one person, corresponds to the job of taking measurements with respect to that person. Accordingly, one such record in, illustratively, a database, is referred to as one event. In addition to these events, where a system operation is measured in a time series, it is possible to call measurements performed each hour events. In this case, the quantity characterizing a system operation acquired in one measurement is a feature quantity.
To date, a variety of technologies have been proposed for extracting relevance among data from extensive data. However, most of these are technologies that extract only feature quantities having relevance among data, or that extract events possessed of a specific relevance.
As discussed above, however, recent years disclose a trend toward the blind accumulation of data. It is not always the case that the accumulated data are relevant to all the varieties of the feature quantities obtained. Even where the accumulation is confined to specific varieties of feature quantities, there is no guarantee that the accumulated data will bear relevance to these feature quantities. Accordingly, as regards data that are accumulated blindly, it is necessary to extract a combination of specific events having relevance with respect to a combination of specific feature quantities.
An objective of the present invention is to provide a technology that extracts mutually relevant data from among data in which a prescribed variety of feature quantities is correlated to a plurality of events, by combining feature quantities and events.
The data decomposition apparatus contemplated by the present invention is a data decomposition apparatus that extracts partial data from whole data, by selecting, with respect to each record and from data which has cataloged a plurality of attributes possessed by each record, the combinations of each event corresponding to each record and the combinations of feature quantities, these quantities comprising the attributes. The data decomposition apparatus according to the present invention further comprises: (a) means for figuring, with respect to combinations of specific feature quantities combinations of specific events, an evaluation value that becomes the standard against which the relevance among data is evaluated; and (b) means for extracting a plurality of partial data for which the evaluation value is the maximum value with respect to changes in both the feature quantity combinations and the event combinations.
The data decomposition method contemplated by the present invention is a data decomposition method that extracts partial data from whole data, by selecting, with respect to each record and from data cataloging a plurality of attributes possessed by each record, the combinations of each event corresponding to each record and the combinations of feature quantities, these quantities comprising the attributes. The data decomposition method according to the present invention further comprises the steps of: (a) figuring, with respect to combinations of specific feature quantities and combinations of specific events, an evaluation value that becomes the standard against which the relevance among data is evaluated; and (b) extracting a plurality of partial data for the evaluation value is the maximum value with respect to changes in both the feature quantity combinations and the event combinations.
In the present invention, the mutually relevant records (events) and the feature quantities associated with each event from the blindly accumulated data are selected and extracted from an assembly of a totality of events and an assembly of feature quantities. Accordingly, it is possible easily to extract mutually relevant data, without human intervention for the purpose of sifting through the data.
This technology can be used effectively as an antecedent process to processes for finding interrelations in a fixed interval of data among data gathered, such as illustratively, multi-variable analysis, data mining, and pattern recognition.