1. The Field of the Invention
The present invention relates to systems and methods for identifying word patterns referenced in text, and more specifically, to identifying the word patterns substantially in real time.
2. The Relevant Art
The Internet may be the most significant technological development of recent times. It allows inexpensive and almost instantaneous communication throughout the world. As more and more users begin to take advantage of the Internet, more resources are being directed to enhancing the ability of users to make use of information available on the Internet.
Particularly, various tools that assist in speeding up Internet transmission, searching the Web, and conducting research are continually being developed and distributed for Internet users"" benefit. One type of tool that has been developed and which may be used on information downloaded from the Internet is a text parser. Much of the content available on the Internet is in the form of text documents. Volumes and volumes of information are on the Web in text document format. To assist a user in more readily understanding contents of text documents, developers have provided document parser programs.
Such programs typically receive a text document that a user wishes to have parsed and stores that text document persistently in static memory. The parser then makes continual passes over the text, combing it for identified words. Those words can then be identified and presented to the user, generally with some type of enhancement. Such enhancements may include a dictionary reference, a link to an identified web site, or the like.
Such programs suffer from the drawbacks of being somewhat cumbersome and slow. They require significant processing resources, and accordingly are typically used only on powerful computers such as mainframes, work stations, servers, and the like. Additionally, the wait for a user while the text is parsed is generally considerable, because the multiple passes that are necessary for such parsing takes time and generally must be conducted remotely. This generally slows down research being conducted, and in general, lessens the Internet experience somewhat.
Therefore, what is needed is a manner of identifying word patterns in text in a quick and efficient manner in order to improve research efforts and enhance the abilities of users to profitably use the Internet.
The system and method of the present invention have been developed in response to the present state of the art, and in particular, in response to the problems and needs in the art that have not yet been fully solved by currently available systems and methods. Accordingly, it is an overall object of the present invention to provide a system and method that overcomes many or all of the above-discussed shortcomings in the art.
To achieve the foregoing object, and in accordance with the invention as embodied and broadly described herein in the preferred embodiment, an improved system and method for identifying word patterns in text is provided. In certain disclosed embodiments, the system for identifying objects referenced in a stream of text comprises an input pipeline configured to receive an incoming stream of text comprised of words; a text analysis module configured to consult a semantic network to automatically identify one or more word patterns in the incoming stream of text with a single examination of each word; and an object association module configured to reference a known object identified by a word pattern of the semantic network.
The semantic network may be configured to be loaded substantially entirely into RAM memory of a processor, and the text analysis module may be configured to consult the semantic network within the RAM memory. Additionally, the input pipeline may be configured to divide the text. In certain disclosed embodiments the method comprises receiving an incoming stream of text comprised of words; consulting a semantic network to automatically identify one or more word patterns in the incoming stream of text with a single examination of each word; and referencing a known object identified by a word pattern of the semantic network.
The method may also comprise loading the semantic network substantially entirely into RAM memory of a processor and the step of consulting the semantic network may be conducted by consulting the semantic network within the RAM memory.
The semantic network may be consulted in a hierarchical order moving from identified nodes to related nodes linked with the identified nodes. In one embodiment, the method examines words in the stream of text in a sequential order as the words are received and formats the stream of text to represent identified objects without persistently storing the stream of text. The method may also involve breaking the stream of text into individual words and analyzing each word in an order of occurrence of the word in the stream of text by comparing the individual words to identified words in the semantic network.
In addition, the method may involve finding a match between an individual word in the stream of text and a word within the semantic network. Upon finding the match, the method compares the individual word and an adjacent word of the stream of text to a word pattern in the semantic network to find a word pattern involving the word. Additionally, words of the stream of text may be continually added to recognized word patterns and the result compared to other word patterns in the semantic network until no more word patterns containing the individual word are located. Links are preferably followed between the word patterns and recognized objects, and the identified known objects presented to a user.
In one embodiment, the identified objects are presented to a user by providing links between identified word patterns in the stream of text and objects in a knowledge base to which the word patterns identify. The links may be provided in the form of URLs.