Most usable word processing applications incorporate some form of automatic spelling or grammar checking systems to aid an individual when editing a document. For example, Microsoft® Word® indicates that words are misspelled by underlining the word in a red line or indicates that a phrase is grammatically incorrect by underlining the phrase in a green line. The individual clicks on the word to gain insight on alternative approaches to spelling or grammar. By selecting one of the alternatives, the individual can edit the document to improve the document's readability.
Unfortunately, spelling or grammar checking systems are lacking in capability, especially when migrating the checking system from one language to another. For example, an English document checker is completely useless for Japanese due to the differences in grammar, alphabets, or character representation. Furthermore, many spelling or grammar checking systems do not find subtle errors. Nor do they find words having uncommon spelling or uncommon phrasing practices that would be considered outside common usage.
Consider the following properly formed sentence: “The engineer walked into the lab.” Someone who works in an engineering group might accidentally write the sentence as follows: “The engineering walked into the lab.” Notice the accidental “ing” added on the end of the word “engineer.” Microsoft Word's spelling or grammar checker does not catch this problem (at the time of writing this document) because the word “engineering” could be a noun and; therefore, could be the subject of the sentence. Although the sentence could be as intended, it is unlikely to be correct because the construction of the phrase is particularly unusual with respect to common usage.
As used in this application, the term “common language” refers not to similarities of one language with another, but to similarities in usage of languages. With that distinction in mind, the current inventors have appreciated that methods are needed that can identify similarities in usages in any language, i.e. in an language independent fashion.
Previously, methods include the use of rule-based systems that attempt to incorporate knowledge of semantics, syntax, or extensive databases comprising correct forms of words. The following patent applications, for example, reference using natural language rules to aid document users in editing documents:                20060004563; 20050273336; 20050273318; 20040059730; 20040059718; 20040059564; 20030097252; 20030069877; 20030061201; 20030061200; 20030004716; 20030033288        
Similarly, the following issued patents reference using natural language rules to aid individuals editing documents:                6928425; 6820075; 6778979; 6658627; 5995920; 5666442; 4914590        
While these references address their respective problems adequately, they do not fully cover the capabilities desired by individuals editing documents. Natural language processing has been around for many years and focuses on employing the “rules” of the natural language so a software program can help the individual identify potential problems within their documents.
U.S. Patent application Ser. Number 20030033288 and its corresponding U.S. Pat. No. 6,820,075 offer auto complete capabilities to users based on surrounding text within the document. Contextual information surrounding a document fragment forms the basis for a query into a database. The database returns candidates for completing the fragment or for correcting errors. However, these references and the others listed above do not teach how to provide guidance on the usage of a common language in a language independent manner through a statistical approach.
A publication accepted at the 2006 Society for Industrial and Applied Mathematics (SIAM) Conference on Data Mining on Apr. 20 to 22, 2006, titled “Using Compression to Identify Classes of Inauthentic Texts” authored by M. Dalkilic, W. Clark, J. Costello, and P. Radivojac teaches a method for using compression algorithms to indicate if documents have characteristics of authentic documents written by humans. Although the paper offers several insights into statistic document analysis, the paper does not teach, suggest, or motive using a guidance filter to offer insight into creating a document that conforms to a common language.
Thus, there remains a considerable need for methods or apparatus that guide an individual on the usage of a common language.