1. Technical Field
The present invention relates generally to an improved data processing system and, in particular, to a method and apparatus for processing documents and text streams.
2. Description of Related Art
Jargon is the bane of one""s life in the modern world. Scientists and engineers are confronted with journals and papers that are filled with technical jargon. Government agencies produce voluminous reports with special terms and acronyms.
Many of these documents are written by persons familiar with technology but are not necessarily written by technical writers. Therefore, these documents are not written in a manner sympathetic to one who is unfamiliar with the subject matter. The documents tend to have many undefined terms that may include cryptic and undefined terms and acronyms. It may be difficult or impossible to read and comprehend a document with undefined terms.
The problem of simplifying technical papers grows steadily worse as technology marches onward. Although many corporations have attempted to incorporate open standards into their products, which results in some terms for proprietary technology becoming obsolete and unused, the number of special terms continues to increase rather than decrease. Technological growth spurs more technological innovation that requires special words to be coined for new concepts. As a result, more special terms or words are created. Almost no one is immune from this problem. With the explosive growth of the World Wide Web, persons searching the Web are likely to view documents and Web pages containing subject matter and terms with which one is unfamiliar.
Therefore, it would be advantageous to have a method and apparatus for rendering documents more readable. It would be further advantageous if the method and apparatus provided a user with some manner of understanding unfamiliar terms.
The present invention provides a method, apparatus, and instructions for processing a datastream in a data processing system. A datastream is automatically scanned and unidentifiable words are detected. The automatic scanning may be initiated in response to a spell checking operation on the datastream. A user provides an indication that an unidentified term is a correctly spelled new term, and the user provides a definition of the new term. A glossary of terms is automatically constructed for the document comprising the new term and the definition of the new term.