The act of parsing a sentence of a natural language involves resolving the sentence into its components parts and determining the structural relationships amongst the words from which the sentence is constructed. As is well known in the theory of language, there are two main syntactic structures: the constituent structure and the dependency structure of a sentence.
Although considerable progress has been made in the linguistic theories of parsing and grammars, the problem of automatically parsing natural language text by machine has still not yet been satisfactorily solved. This is largely due to the fact that people use a natural language in a free and creative way.
Previous at tempts to devise an automatic parsing machine for natural languages have been mostly rule-based. A pre-requisite of a rule-based system is a set of language dependent rules written and refined by linguists over many years. For a rule-based parsing machine to work, the rules of the particular language have to be identified and incorporated into the system. Writing grammar rules for rule-based parsing machines is a daunting task that only computational linguists can effectively perform. in devising such rule-based systems, it has been necessary to formalize the grammar in systems such as the generalized context-free grammar formalism and the tree-adjoining grammar formalism.