1. Field of Invention
This invention is related to machine mediated instruction and computer based learning systems.
2. Description of Related Art
Current writing analysis involves human writer review processes. These processes include an instructor or a peer reviewing a written work. These processes can also be computer enhanced by enabling the review process to take place over a computer network, via, for example, email. Alternatively, the review process can take place in computer-enhanced, collaborative-peer-review environments, in which other participants critique the written work.
These writing analysis processes suffer from a number of problems. In writing workshops, the instructor reviews the text from the perspective of the target audience. The instructor provides feedback on problematic language. The instructor often indicates a sentence is incorrect but usually does not provide a reader centric microanalysis of the text explaining precisely why the text is incorrect.
In fact, reviewers tend to focus on two levels of structure in the review. Since reviewers feel comfortable with grammar checking, the reviewers might perform grammar checking for the entire text. Less frequently, vague or ambiguous sentences may be identified. However, reviewers do not explain why a sentence is vague or ambiguous. The location of the problem is identified and the user is expected to understand how to correct the problem.
A second problem stems from the stylistic preferences of each reviewer. A reviewer's subjective style preferences will in turn affect a reviewer's critique of the written work. Thus, instead of receiving an objective indication of how well the work communicated the desired information to the reader, the reviewer may focus on subjective stylistic preferences.
Further, the sheer volume of material to be reviewed tends to force a reviewer to focus on relatively easier problems. For example, the writer's overall organization may be critiqued with only the occasional exemplary sentence construction analyzed in detail.
Outlining tools, such as the “outline view” in the Microsoft Word® word processor application, as well as Xerox Corporation's “Notecards” and EastGate Corporation's “StorySpace” tools, help the writer frame higher level concepts and issues. The writer may then concentrate on developing each of the ideas within a framework. However, these tools do not address the structure and organization of the text below the concept and topic level. Since these tools do not support analyzing how the units of meaning interrelate, they cannot be used to show the writer how to improve the text below the concept and issue level.
Grammar checking tools, such as “Grammatick II” and the grammar tool in Microsoft Word®, apply statistical formulas to a selected text to determine readability based on a variety of different formulae, such as the Flesch Reading Ease Index or the Flesch-Kincaid Grade Level index. These tools also provide sets of rules that can be applied to a written text to identify run-on sentences, sentence fragments, archaic expressions and gender-specific expressions. However, these tools merely indicate whether a text satisfies the rules. Though these tools may provide suggestions for improving a text, based on the identified rule violations, these tools do not provide a structural representation of the text. Therefore, these tools cannot show the user how to improve the relationships of meaning between the units of text.
Text analysis tools, as described in Marcu, “The Rhetorical parsing of natural language texts” Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, July, 1997, 96–103 can provide structural representation of a text based on an analytic framework. The system described, for example, is directed to the analysis of texts for the purpose of improved discourse level automatic Natural Language processing. It is not directed to improving the structure or style of the text or to educate the writer in how to improve the organization of future prose works that might be produced.
Rhetorical Structures Theory is a highly complex theory. The basic units of analysis have never been articulated clearly and the several variations of Rhetorical Structures Theory differ from one another in the basic relations between units that they employ. The relations themselves are very complex, overlapping and ambiguous. For example, classical Rhetorical Structures Theory as developed by Mann and Thompson (1988) includes at least ten presentational relations, five multinuclear relations and fifteen subject matter relations. Training coders is a very lengthy task, requiring weeks of intensive study and supervision. Strong differences of opinion arise among experienced coders about the relationships which link units together. Intercoder reliability is very low. Depending on the level of analysis chosen by different coders, quite different structural trees labeled with Rhetorical Structures relations may be built. Therefore, different Rhetorical Structures Theory analyzers may produce significantly different structural representations from the same text. These factors taken together pose particular problems when attempting to apply Rhetorical Structures Theory in a learning environment where the goal is to help students improve their written communicative skills by applied text micro-analysis.
The Summarist system, as discussed by Hovy and Lin, “Automated Text Summarization in SUMMARIST” in Proceedings of the Workshop of Intelligent Scaleable Text Summarization, July 1997, uses statistical techniques, along with symbolic world knowledge of word meaning based on dictionaries, in attempting to discern a writer's intent. Since the Summarist system uses statistical techniques to identify important keywords, the Summarist system only produces topical keyword summaries.
Thus, these conventional natural language processing systems attempt to identify intended meaning in a text corpus. These systems do not exploit linguistic constraint information provided in the text but instead rely on statistical analysis and word frequency counts. A determination is made, from this statistical information, as to what information the author intended to convey. This information is then used to facilitate queries.