1. Field of the Disclosure
The present disclosure relates to a natural language processing method parsing an input string using a parsing algorithm and regular production rules. The disclosure further relates to an integrated circuit and an electronic device for language processing.
2. Description of Related Art
Language processing methods segment a user utterance into sentences and the sentences into tokens, e.g. words or phrases. Syntax parsers use the tokens to determine a syntactical structure in the sentence. Thereby the syntax parsers use algorithms based on a grammar that describes the syntactical relationships between the words of a sentence. The grammar is embodied by a plurality of production rules, wherein each production rule corresponds to a grammatical rule that describes how pairs of words and multi-word phrases can be combined with each other to obtain multi-word phrases of a certain phrase type. A grammatically correct sentence can be represented by a parse tree. Information in terminal cells of the parse tree describes the lexical category of the tokens. Any possible multi-word phrase within the sentence is assigned to a non-terminal cell. Information in the non-terminal cells describes (i) the phrase type of the multi-word phrase and (ii) how the multi-word phrase is construed from the words. Accordingly, information in a root cell describes how the sentence is construed from the words and multi-word phrases and which grammatical rules are used to build up the sentence. Natural languages show ambiguities with respect to both the lexical category of tokens and the grammatical rules such that often more than one grammatical rule may be applied and a parse forest with a plurality of parse trees may result for the same sentence. In advanced parsers, probability values may accompany grammatical rules and/or tokens and, when applying matching production rules, the syntax parser may consider the probabilities to prefer a parse tree with a higher probability.
It is an object of the embodiments to provide an improved natural language processing method and an integrated circuit as well as an electronic device for improved natural language processing.