1. Technical Field
The present invention relates generally to an improved data processing system and, in particular, to a method and apparatus for processing documents and text streams.
2. Description of Related Art
Jargon is the bane of one""s life in the modern world. Scientists and engineers are confronted with journals and papers that are filled with technical jargon. Government agencies produce voluminous reports with special terms and acronyms.
Many of these documents are written by persons familiar with technology but are not necessarily written by technical writers. Therefore, these documents are not written in a manner sympathetic to one who is unfamiliar with the subject matter. The documents tend to have many undefined terms that may include cryptic and undefined terms and acronyms. It may be difficult or impossible to read and comprehend a document with undefined terms.
The problem of simplifying technical papers grows steadily worse as technology marches onward. Although many corporations have attempted to incorporate open standards into their products, which results in some terms for proprietary technology becoming obsolete and unused, the number of special terms continues to increase rather than decrease. Technological growth spurs more technological innovation that requires special words to be coined for new concepts. As a result, more special terms or words are created.
The coining of new words and acronyms does not follow any particular rules, and there is no central authority that acts as a clearinghouse for reserving words. This state of affairs compounds the problem of new jargon because identically spelled acronyms may be created that have entirely different meanings. New words may be created within an organization that have different meanings depending on the person using the word and/or the context in which the word appears. It can then be especially difficult to determine which meaning was intended by the author of a document if the user is aware of two identically spelled words with different meanings.
Moreover, the definition or description of a new word or acronym for a newly developed concept or product may be slowly disseminated throughout an organization so that some persons understand a new term while others are unaware of its meaning.
Therefore, it would be advantageous to have a method and apparatus for rendering documents more readable. It would be further advantageous if the method and apparatus provided a user with the ability to understand documents in a manner that was reconciled to the user""s context. It would be particularly advantageous if information about new terms were disseminated throughout an organization in a timely manner.
The present invention is a method and apparatus for providing a central dictionary and glossary server. An application executing on a client is able to access a local copy of a dictionary or glossary. A master dictionary or glossary is updated at a server, and the update to the master dictionary or glossary is served to the application on the client to update the local copy of the dictionary or glossary. A datastream may also be processed by automatically scanning a datastream and automatically detecting, in the datastream, a word that cannot be matched to a word in a dictionary or glossary. The unmatched word is identified as an acronym, and in response, data associated with the acronym, selected from a hierarchical set of glossaries, is inserted into the datastream in close proximity to the acronym. In another aspect of processing a datastream, in response to an indication that the unmatched word is a properly spelled new term, a dictionary or glossary may be updated with the new term, and the dictionary or glossary is a member of a hierarchically ordered set of dictionaries and/or glossaries. The system may also contain an organizational database comprising information for organizational units associated with a data processing system, and each glossary in the hierarchical set of glossaries is associated with an organizational unit.