This invention relates to apparatus for detecting misspelled words in electronically stored documents and more particularly relates to a spelling error detector and methods of operation.
In modern technology there are many devices which store documents electronically and when accessed will create a printed image of the stored data. As such, these devices include word processors.
The word processor is a machine on which a printed image can be corrected and manipulated before it is printed out in final form. The word processor uses computer technology to operate with words. There are of course many other types of systems which essentially store data in the form of words and subsequently can print out the data. These systems include data processing computers and so on.
Modern word processors utilize four basic elements which are a visual display unit and an input keyboard, a memory, a text storage media and a printer. The combination of a typewriter keyboard and the visual display unit is generally referred to as the Work station. The display enables the operator or typist to see the text before it is finally printed. Displays vary from single line displays to full page displays. As indicated, every word processor has an internal memory unit where the text or words are stored and manipulated. The space available in the word processor memory for the text is normally not very large and in most word processors the memory can only hold one or two pages of text. Hence in more sophisticated units, additional pages of text are transferred into a remote memory designated as a text storage media. This additional memory usually consists of a cassette tape, a floppy disk or diskette.
Essentially, the most common device employed is a diskette or floppy disk and this memory device can hold between 80 to 160 typical pages of text. Of course new developments are continuously being made and there exists hard disk memories which permit higher storage levels and faster access times. In any event, there is need in conjunction with such equipments to detect misspelled words in documents which are stored as above described.
In using conventional techniques, small computer systems as well as word processors do not have sufficient storage capacity nor processing power to check the spelling of the stored words. There are large expensive, typesetting machines which typically use a 20 to 80 million character mass storage device to actually store an abridged dictionary. These machines use well-known indexing methods to check spelling of stored documents. However, small computers and word processors only have one hundred thousand to two million characters of storage and hence, this memory is not enough to hold a sizable dictionary.
More importantly is the fact that a small computer or word processor cannot search through long word lists in a reasonable period of time and hence, to check spellings by prior art techniques will be extremely time consuming.
In the prior art, one system employed in conjunction with a small computer attempts to solve the problem by dividing words in to lists of prefixes, suffixes and word roots. The time to locate a word root is rather small, therefore the search operation is relatively rapid. However, these techniques do not permit automatic hyphenation and also allow certain invalid words to appear as correctly spelled. For example, a word such as "perfix" is considered to be a correct spelling since "per" is a valid prefix, and "fix" is a valid root. It is therefore an object of the present invention to provide apparatus for use with processing systems and general purpose computers which apparatus will quickly and rapidly isolate misspelled words in electronically stored documents.
It is a further object to provide such apparatus to enable documents which are electronically stored to be hyphenated automatically.