1. Field of the Invention
This invention relates to information retrieval systems, and more particularly relates to a computer implemented method and system for determining the information density of a target body of digital text.
2. Description of the Related Art
With the emergence of the Internet has come access to almost limitless amounts of information. Users typically seek access to this information with computer in logical communication with an information-retrieval system like Google® or other search engines. These search engines typically comprise a graphical user interface for accepting keywords used to identify relevant information, typically in the form of online electronic documents.
Typical search engines rank the importance of a document by the number of instances that the keyword appears in the body of text forming the document, the number of backlinks to the document, and the importance of the documents comprising the backlinks. This is problematic because many documents included in a given set of search results contain only superficial references to the keywords, and users are left to manually filter through irrelevant results in search for the documents conveying more denser amounts of information.
There exists no efficient means in the art of valuing the context in which keywords are appearing, or of determining the information density of sentences and clauses comprising the keywords and it synonyms. Methods, computer program products and systems are lacking in the art which determining the semantic density of textualized digital media. Semantic density is a measure of how much information is conveyed in a sentence or clause relative to its length in a given block of text. The more semantically dense text is, the more information it conveys in a given space.
The semantic density of text with regard to a particular topic is a useful metric to users desiring to measure trends in online chatter, news, such as a political campaign.
The present invention beneficially teaches a unique computer implemented methodology for determining the semantic word density of bodies of text in digitized documents which overcomes prior shortcomings in art.