Keywords or classification codes are conventionally used to search for specific information contained in an enormous amount of data. For example, in the patent literature, each application includes the application number, the name of the invention, the applicant's name, the inventor's name, and the IPC classification. A specific patent application may be located in a database using the title of the invention or the name of the applicant as keywords, for example, or using the application number or IPC classification. The intended patent application can be found reliably if the keyword or classification code is suitable.
However, with the conventional method as described above, it is very difficult to obtain information that satisfies multiple criteria. For example, consider the case of identifying the inventors in each technical field from a database of patent publications. In such a case, if the number of subject inventors is very large, or if some of the inventors are active in multiple technical fields, it is difficult to obtain precise information simply by using keywords or classification codes. Also, if the inventors are grouped, and the groups include inventors with low frequency of occurrence, the number of inventors contained in a specific group may be too large. Furthermore, it is almost impossible to determine the relation between inventors in a technical field, or to deduce relationships of primary and secondary contributors to a given field.
Therefore, when such information is needed, the data is often arranged manually. Of course, this takes a lot of time, and is inefficient and expensive. Also, there is a lot of room for personal judgment during the analysis, which may therefore provide different results depending on who does the analysis.
The above example is couched in the field of patents, as a descriptive convenience. In recent years, however, it has become increasingly important in general data sets, especially in the field of genome research. The present invention applies as well to such more general problems.