Discourse theory is an approach to understanding the content and significance of natural language documents and other units of natural language. According to discourse theory, each natural language document has a "discourse structure" that reflects the purposes of the document's author in authoring the document. Discerning the discourse structure of a natural language document is commonly regarded as an important component of understanding the document.
The discourse structure of documents is frequently modeled using hierarchical "discourse structure trees," or simply "trees," such as the "rhetorical structure theory trees" ("RST trees") proposed by Mann and Thompson, "Relational Propositions in Discourse," Discourse Processes 9:57-90 (1986). Such discourse structure trees characterize the relative significance of the constituent segments of the documents, called "propositions." These propositions are generally clauses or phrases. A discourse structure tree identifies the relationships, or "discourse relations," that exist between propositions in the document.
Discourse structure trees are typically generated manually, at significant cost, by experts trained as linguists. Because the manual generation of discourse structure trees is expensive, they remain a largely a theoretical tool used to study discourse in general. An automated approach to inexpensively generating discourse structure trees representing the discourse structure of natural language documents, however, would permit the application of discourse theory to the analysis of arbitrary documents.