In system or software development, to acquire requirements regarding conditions or capabilities the development system must satisfy for solving problems or achieving objects from the client is called as the requirement acquisition. In requirement acquisition, the requirements must be grasped with consideration of important phrases of related documents or relations therebetween for extracting the client's requirements without omission and utilizing them for the specifications and designs.
Conventionally, in the requirement acquisition, an analyzer extracts important phrases manually, and grasps the requirements by the clue of those important phrases with linking similar contents described in different parts thereof. However, it requires a lot of efforts and hours for extracting important phrases and linking them with repeatedly reading a large volume of documents. Further, important parts may be omitted caused by human errors.
There are methods of extracting nouns, verbs and the like for supporting the analyzer by the morphological analysis (syntactic analysis). In the requirement acquisition method described in non-patent literature 1, for example, nouns and verbs are extracted. Further, in the requirement acquisition support device described in patent literature 1 (Japanese Patent Application Publication JP-A-H06-067862), division to words is performed by the syntactic analysis and detailed patterns are retrieved.
There is also a method without dividing to words in advance and extracting partial strings which emerges in plural times in related documents as important phrases. In the phrase extraction method described in non-patent literature 2, for example, phrases which repeatedly emerge are extracted as important phrases.
However, in methods of extracting important phrases as described in patent literature 1, non-patent literatures 1 and 2, when the analyzer determines the mean of an extracted phrase, the analyzer must refer to the original document and check it. Normally, a large number of phrases is extracted, and the number of sentences including these phrases is also large, so that huge efforts and hours are required for checking operation similarly to the case of manual extraction which is performed conventionally.
Further, one requirement does not always include only one important phrase, and in most cases, a plurality of important phrases like a noun and a verb are included. Therefore, it is difficult to link important phrases to the original document, because it is required to determine what kind of combination of important phrases are to be linked.
Moreover, when important phrases are extracted, in the method of dividing into words in advance by the morphological analysis as described in non-patent literature 1 and patent literature 1, there is a problem where the words cannot be extracted correctly caused by erroneous word division. For example, “gaikokujin-sanseiken”, which means “foreigners-suffrage”, may be divided into “gaikoku” “ninjin” “seiken” (“ninjin” is another reading of “jinsan” in Chinese characters), which means “foreign country”—“carrot”—“regime”. In addition, there is also a problem that unknown words which are not registered in the dictionary used in the morphological analysis cannot be extracted. Therefore, for example, an abbreviation like the English string “ABC” cannot be extracted.
In a method of extracting partial strings which emerge in plural times from related documents as described in non-patent literature 2, because many similar phrases are extracted, it is required to determine whether the extracted phrases are important phrases or not to choose or refuse them by checking the original document. As a result, there is a problem that efforts and hours for determination are needed.
As a related technique, in patent literature 2 (Japanese Patent Application Publication JP2008-234049A), an abstract text generation device and an abstract sentence generation program are disclosed. In this related technique, when selecting a sentence whose degree of similarity with another sentence is equal to or more than a preset threshold value, it is extracted in order without repeatedly calculating the degree of similarity, and the threshold value is necessary to be determined in advance.