The following relates generally to methods, and apparatus therefor, for performing string replacements using natural language processing.
Generally, search and replace operations for searching in a document for a string and replacing it with another string are know. While such search and replace operations have become standard in most document processing applications, they have limited linguistic awareness. That is, such search and replace operations are not known to assess interrelationships between strings in the document (i.e., direct or indirect connections between the string that is being replaced and other strings in the document) to anticipate ambiguities that may be introduced when the replacement string is introduced into the document.
Accordingly it would be advantageous to provide a search and replace function that is adapted to warn or anticipate when inconsistencies in agreement may be introduced when performing a search and replace in a document (i.e., when elements in a document that linguistically depend on the string being replaced in the document require agreement with the replacement string). Further it would be advantageous to provide a search and replace function that is also adapted to assess the possible senses that the string to be replaced may have with the replacement string to determine whether the replacement's use in the document is semantically coherent.
In accordance with various embodiments described herein, there is provided a method, and apparatus, for replacing an existing string in textual content of one or a collection of documents with a replacement string while taking into account morpho-syntactic properties of the existing string and the replacement string (i.e., morphological features and part-of-speech categories). The morpho-syntactic properties of the contents of the document are assessed before the string replacement takes place, thereby allowing only those occurrences that satisfy user specifications to be replaced, and thereby resolving ambiguous relations (which resolution may be automatically determined and/or determined through user intervention) that may be introduced when the string replacement occurs (e.g., when plural and singular replacements exist for the replacement string in a document).
In accordance other of the various embodiments described herein, the method for replacing the existing string with the replacement string in textual content corrects other strings in the textual content that linguistically depend on the replacement string. A string in the textual content that linguistically depends on the replaced string may have a morphologic relation (e.g., in its person, number, or gender), a syntactic relation (e.g., in a part-of-speech), and/or an anaphoric relation (e.g., in pronoun/antecedent dependencies) with the replaced string. Dependencies may thus be identified that result from direct links between strings in the textual content and the replacement string and indirect links between strings in the textual content and the replacement string. Advantageously, linguistically related strings that are linked directly or indirectly with the string to be replaced are identified so that when the replacement string is introduced grammatical inconsistencies may be identified and corrected.
In accordance other of the various embodiments described herein, the method for replacing the existing string with the replacement string in textual content detects and alerts a user of semantic relationships that may cause variations in sense (i.e., meaning). The existence of semantic relationships is performed at a first level by evaluating the meaning of the strings on their own (i.e., evaluating whether the senses of the existing string and the replacement string are semantically coherent), and at a second level by identifying contextual inconsistencies introduced when the replacement string is introduced in a single-word or multiword expression that is within larger a string or within other strings that define a larger linguistic unit (i.e., evaluating whether the single-word or multiword expressions in which the replacement string is found are semantically coherent).
In accordance yet another of the various embodiments described herein, a method for replacing in a document a source string with a target string includes: morpho-syntactically disambiguating textual content of the document; identifying a set of string dependencies by detecting grammatical or anaphoric dependencies, or both, between strings in the textual content of the document; disambiguating one or more of gender, number, or part of speech with user specifications when the source string or the target string have more than one possible meaning; identifying occurrences of the source string in the document that satisfy the user specifications; identifying string relations from the set of string dependencies that define direct or indirect links, or both, to the source string; replacing each occurrences of the source string in the document that satisfy the user specifications with the target string; correcting grammatical or anaphoric inconsistencies, or both, in the string relations in the document that are introduced when the source string is replaced with the target string; and outputting the document.
In accordance a further of the various embodiments described herein, a method for replacing in a document a source string with a target string includes: morpho-syntactically disambiguating textual content of the document; identifying occurrences of the source string in the document that satisfy user specifications; identifying a first set of possible senses for the source string and a second set of possible senses for the target string; assessing whether replacing the source string having the first set of possible senses with the target string having the second set of possible senses is semantically coherent; and replacing each occurrences of the source string in the document that satisfy the user specifications with the target string; outputting a warning when the replacement of the source string with the target string is not semantically coherent; and outputting the document.
In accordance yet a further of the various embodiments described herein, a method for replacing in a document a source string with a target string includes: morpho-syntactically disambiguating textual content of the document; identifying a set of string dependencies by detecting grammatical dependencies between strings in the textual content of the document; disambiguating one or more of gender, number, or part of speech with user specifications when the source string or the target string have more than one possible meaning; identifying occurrences of the source string in the document that satisfy the user specifications; identifying string relations from the set of string dependencies that define direct or indirect links, or both, to the source string; replacing each occurrences of the source string in the document that satisfy the user specifications with the target string; correcting grammatical inconsistencies in the string relations in the document that are introduced when the source string is replaced with the target string; and outputting the document; wherein the disambiguation of the source string or the target string is performed before replacing each occurrences of the source string in the document that satisfy the user specifications with the target string.