1. Field of the Invention
This invention relates generally to database retrieval systems for retrieving information stored in a database, and, more particularly, to database retrieval systems for retrieving information stored in a database using natural language expressions.
2. Description of the Prior Art
FIG. 1 is a diagram illustrating a conventional database retrieval system for retrieving data from a table formatted database in response to a natural language query. A natural language query is a request for data that is set forth in a natural language, such as English, Japanese, French, etc. The illustrated database retrieval system is described in more detail in "Kinukawa, A Natural Language Interface Processor Based on the Hierarchical-Tree Structure Model of Relation Table. Journal of Information Processing Society of Japan, Vol.27, No.5 (1986), pp.499-509." This system is designed to process queries in Japanese. For the examples described below, the English translations of Japanese words and phrases are provided in parenthesis.
The database retrieval system shown in FIG. 1 includes an input unit 2, such as a keyboard, for entering a natural language query 1. The system also includes a communications controller 3 for forwarding the natural language query 1 to a retrieval sentence analysis unit 5. The retrieval sentence analysis unit 5 processes the input query 1 to produce a hierarchical model of the query. The system additionally includes a word dictionary 4, that is constructed on the basis of the content of a database 9, and a hierarchical table model 6 for hierarchically expressing the content of the database. The dictionary 4 and hierarchical table model 6 are used by the retrieval sentence analysis unit 5 in analyzing the natural language query 1. The retrieval sentence analysis unit 5 performs both vocabulary analysis and syntactic/semantic analysis on the natural language query 1. The retrieval sentence analysis unit 5 produces a retrieval sentence analysis result 7 as output that is forwarded to a retrieval processing unit 8. The retrieval processing unit 8 uses the retrieval sentence analysis result 7 to retrieve data from the database 9.
The depiction of the conventional database retrieval system shown in FIG. 1 is a functional description intended to show the interaction between the respective components of the system. The components shown in FIG. 1 are, in fact, implemented in a data processing system 10, such as that shown in FIG. 2. The data processing system 10 includes a central processing unit (CPU) 11, a memory 12, the communications controller 3, an output device 17 and the input unit 2. Each of these components is coupled to a bus 13. The retrieval sentence analysis unit 5 and the retrieval processing unit 8 are implemented in software that is executed by the CPU 11 (FIG. 2). The software is stored in the memory 12. The word dictionary 4 (FIG. 1), the hierarchical model table 6 and the database 9 are stored within the memory 12 (FIG. 2).
FIG. 3a provides a more detailed depiction of an example of the word dictionary 4. As this Figure shows, the dictionary includes a plurality of entries, and each entry includes three fields. The header field identifies the term or phrase associated with the entry, whereas the part of speech field identifies the part of speech of the term or phrase. Lastly, the type field identifies the type of term or phrase that is used. In the example shown in FIG. 3a, the types are "item name" and "data expression word".
FIG. 3b provides a more detailed depiction of the hierarchical table model 6. This model 6 sets forth the hierarchical relationship between the respective tables. Each table specifies a number of attributes. For instance, table 14 includes the attributes of "date", "commodity code", "commodity group code", and "sales". The "commodity code" attribute is also an attribute in table 16, which is hierarchically related with table 14. Similarly, the attribute of "commodity group code" is an attribute of both table 16 and table 18. The table 14 is a higher order table than tables 16 and 18. Moreover, table 16 is a higher order than table 18. This hierarchical table model is consistent with the relational model for data proposed by E. F. Todd in "A Relational Model of Data for Large Shared Data Banks," Communications of the ACM, June 1970, pp. 377-387.
Table 3c provides illustration of the database 9. The database 9 includes table A, table B and table C. Each of the tables A, B, C includes different types of information. For example, table A contains sales information, table B includes commodity information, and table C includes commodity group information. These tables are used in conjunction to obtain information requested by the natural language query 1 (FIG. 1).
Operation of the system shown in FIG. 1 will now be described. Initially, a natural language query 1 is entered using the input unit 2. When a keyboard is used as the input unit 2, the query is entered simply by typing the query. The query 1 is then passed to the conversation control unit 3, which forwards the query to the retrieval sentence analysis unit 5. The retrieval sentence analysis unit 5 parses the query into a hierarchical structure of words or phrases that is output as the retrieval sentence analysis result 7. In processing the query, the retrieval sentence analysis unit 5 first chops the query into words or phrases. In the present example, the query is chopped into the phrases "chokoreeto rui" and "uriage". The terms "no" and "ha" are zyoshi, whose significance will be described in more detail below.
Once the query has been divided into words or phrases, vocabulary analysis is performed on the words or phrase to determine what each word or phrase in the query signifies. In performing such vocabulary analysis, the retrieval sentenced analysis unit 5 references the word dictionary 4 to determine that "chokoreeto rui" (chocolates and the like) is a data expression word (see FIG. 3a). The retrieval sentence analysis unit 5 also determines that "uriage" (sales) is an attribute item name, respectively. The word dictionary 4 indicates that both of these phrases are nouns. The dictionary 2 is not referenced for the zyoshi "ha" and "no".
Syntax and semantic analysis is then performed on the query. In particular, syntactic analysis is performed to process the syntax or the query in order to understand the role each phrase serves in the query. Semantic analysis, on the other hand, is performed to understand what is being requested by the query.
Subsequently, semantic analysis is performed to relate the meaning of the query to the database entries. The semantic analysis relies on the hierarchical table model 6 (see FIG. 3b) to ascertain that "chokoreeto rui" (chocolates and the like) is an attribute data expression word of a commodity group in table 18 (i.e., table C in FIG. 3c) and "uriage" (sales) is an item name in the table 14 (i.e. table A in FIG. 13c). Moreover, the hierarchical table model 6 (FIG. 3b) indicates that table 14 is a higher order table than table 18. Since the attribute item appearing in the low order table is a noun, and a zyoshi "no" is added thereto, it is recognized that the attribute "chokoreeto rui" in table 18 modifies the attribute "uriage" (sales), which appears in a higher order table 14. Using these results, a retrieval formula "retrieval condition: (commodity group name=chokoreeto rui), retrieval object: uriage" is obtained and is output from the retrieval sentence analysis unit 5. Subsequently, retrieval from the database 9 is performed by the retrieval processing unit 8 to obtain the desired data.
FIGS. 4a, 4b and 4c show dictionaries used in a second conventional database retrieval system, as disclosed in Japanese Patent Laid-Open Publication No. 59-99539. In these dictionaries, information on column name in a file, information on data item name, and information on a file name that possesses a common column name or data name, are stored according to file names of a data file that is contained in a database. FIG. 4a represents a dictionary in which one of the database files contains the column name of a file. The dictionary also holds information regarding the order in which the column is contained in the file and additionally holds information regarding synonyms of the column name (i.e., file numbers and column attribute numbers of columns that are synonymous with the named column). FIG. 4b shows an analogous dictionary in which one of the files contains a data column name, and the dictionary stores a position at which the named column is contained in the file. Lastly, the dictionary stores information regarding synonyms of the data column name. FIG. 4c shows a dictionary holding information as to semantically identical data columns that are connected as synonyms.
FIG. 5 is the designated format for input queries for the second conventional system. This format requires that queries be entered as a number of entries, wherein each entry includes two fields; a noun filed and a particle or auxiliary field. Thus, for the example query 1 (FIG. 1) used in the discussion of the first conventional system, the input query for the second conventional system would be as follows. The first noun field would be entered as "chokoreeto rui" and the corresponding particle field would be entered as "no". Further, the second noun field would be entered as "uriage" and the particle field would be entered as "ha".
In this second conventional system, queries in a natural Japanese format cannot be analyzed. Likewise, the retrieval object is determined in view of the restriction of the designated format shown in FIG. 5. A pertinent data file may, thus, be accessed only by limited terminology including synonyms recorded in the dictionaries.
In the first conventional information retrieval system described above, it is necessary to have previously constructed a hierarchical table model. Since, however, in general, it is not always possible to place the content of a database into a hierarchy, input sentences which do not fall under the defined hierarchical structure cannot be processed. Further, there is no flexibility in receiving natural language phrases or words, such as "sengetsu" (last month) which are not in the database. The system is limited solely to the phrases included in the database. Still further no information is provided on "zyoshi" (particles). Thus, there is also the problem that the ommission of a "zyoshi" cannot be detected.
In addition, when there is an ambiguous word (for example, time periods or seasons), syntactic analysis is impossible unless the definition of the ambiguous word is recorded in detail. In some cases, each interrogator must record the definition on an individual basis according to his usage of the ambiguous term.
Information retrieval is performed for each of the items recorded in a file. Thus, an answer cannot be obtained for a question in which a plurality of files are retrieved as a result of analyzing the input sentence and in which it is necessary to process such a retrieval result to obtain a final result.