The English language, and particularly English language literature contains many idioms. Idioms are sayings or phrases that cannot be understood from the individual meanings of the words because the combination of words contain a secondary meaning. The word “idiom” as used herein includes, but is not limited to, metaphors, clichés, slang, colloquialisms, proverbs and sayings. There are over 20,000 idioms in the English language.
An example of an English language idiom is “it's water under the bridge.” This idiom refers to a past experience or conflict that has been forgotten, or is no longer important or relevant. The idiom compares the memories of the past event to water that has passed under a bridge.
When learning a new language, idioms can be particularly difficult to identify and understand. One reason for this difficulty is that the same idiom can be expressed in different ways. For example, the “it's water under the bridge” idiom can be expressed in at least five different ways as: (1) “it's water under the proverbial bridge”; (2) “its water under the bridge”; (3) “it's like water under a bridge”; (4) “it's just water under the bridge”; or (5) “it is merely water under a very old-fashioned bridge.” Merely memorizing the phrase “it's water under the bridge” may not help a reader recognize each variation of the idiom. Variations in punctuation, verb tense and the addition of modifiers like adjectives and adverbs can obscure the idiom.
Using a computer to search electronic text to identify and define idioms would be useful for people learning a new language. Computerized text searches, however, share the difficulties human readers have when identifying idioms. A standard text search algorithm of “it's water under the bridge” would not identify any of the five variations above. Variations in punctuation, verb tense or the addition of modifiers like adjectives and adverbs interrupt the text string, preventing the text search algorithm from identifying the idiom. A more sophisticated search algorithm is required to identify idioms in electronic text.
An enhanced text search algorithm known in the art searches for a combination of words within close proximity of each other rather than searching for an exact string. In the “it's water under the bridge” example above, a search on the terms “water” next to “under” and within several words of the terms “it's” and “bridge” would identify only three of the variations listed above. The variations with “its” and “it is” are still not identified.
A need exists for a system and method that can search a text document, identify any variation of an idiom, and provide a way for readers to click on the identified idiom for a simple definition.