Technologies have been proposed to standardize data formats of so-called “open data”, which is information contained in documents created by, for example, central government offices and disclosed to the public, in order to facilitate use of the open data. For the efficient use of open data, it is preferable to publish open data as Linked Open Data (LOD) using a Resource Description Framework (RDF) format. RDF format data has a data structure “subject, predicate, object” that can be automatically processed by a computer, and is therefore easily reusable.
However, documents generated, for example, at companies, local government offices, and central government offices in formats such as an Excel format and a comma-separated value (CSV) format cannot be easily converted into RDF format data. This is one of the factors hindering the publication of open data as LOD using the RDF format.
For example, one of the reasons that make it difficult to convert CSV format data (which may also be referred to as “CSV data”) into RDF format data (which may also be referred to as “RDF data”) is that it is difficult to automatically select an appropriate vocabulary used to convert CSV data into RDF data. When input data is CSV data, a vocabulary is used to convert comma-separated character strings into the data structure “subject, predicate, object”. Here, a predicate is a character string indicating a relationship between two character strings representing a subject and an object. It is preferable to select an industry-standard vocabulary and use a predicate in the vocabulary. When industry-standard vocabularies are not used for conversion into RDF data, it becomes difficult to reuse the RDF data and to achieve the object of facilitating the use of open data. Currently, 1291 vocabularies listed at http://prefix.cc/ are registered as industry-standard vocabularies. To facilitate the use of open data, it is preferable to select an appropriate vocabulary from the industry-standard vocabularies for conversion into RDF data (see, for example, Japanese Laid-Open Patent Publication No. 2007-052723, Japanese Laid-Open Patent Publication No. 2005-258659, and Japanese Laid-Open Patent Publication No. 2014-021869).
Open Refine also uses vocabularies (see, for example, Japanese Laid-Open Patent Publication No. 2007-052723). In Open Refine, candidate character strings are presented by autocompleting input initial characters based on a database (registered vocabulary) of metadata of existing subjects, predicates, and objects. For example, when “da” is input as initial characters, “da” is autocompleted and candidate character strings such as “daily” and “date” are presented.