In general, a language consists of a set of symbols called its alphabet and a set of rules to form sentences using the symbols. This set of rules is called a grammar of the language, which imposes a structure on the language. The language is normally described by the grammar, where the language and the grammar are denoted by L and G, respectively. The language L can be a natural language, a computer programming language or any other language with a well-defined grammar.
A grammar consists of a set of productions or rewriting rules. This set of productions maps a string of non-terminal and terminal symbols to a non-terminal symbol. The special non-terminal symbol is designated as a START symbol of the grammar. Each production can be represented as a tree structure with the left-hand side non-terminal as the root node and the symbols in the right-hand side of the production as child nodes of the root node. The non-terminal node is recursively expanded to a string containing only terminal symbols by applying different productions of the grammar.
Conventional methods exist to describe a grammar for processing in a computer environment and to check if a set of sentences (or programs) conforms to the grammar. Such methods can act as a basis of program translation tools like compilers, assemblers and also natural language processing systems. A grammar parser is used to check if a set of sentences conforms to the given grammar. A parser is either manually created or automatically generated using tools like, for example, YACC and BISON. A parser accepts a stream of tokens, which can be part of a program written in a programming language, an assembly language statement or a natural language sentence, and the parser can check if the tokens form a valid sentence in the underlying language.
Often, it is required to generate some or all sentences of a language automatically. For example, an automated response system for providing information to users would require generation of specific sentences in English. As another example, it may be required to generate different instruction sequences of a microprocessor to test the behavior of the microprocessor. As a further example, it may be required to generate different DNA sequences as part of a molecular biology experiment. Given a grammar for a language, it is possible to generate all the sentences of the underlying language by starting at the START symbol and repeatedly applying different productions of the grammar. However, it may be required to automatically generate only a subset of the sentences of a language based on a specification of what sentences should be generated. Therefore, it is desirable to provide a system and method for automatically generating sentences of a language in a controlled manner.