Due to an increased knowledge base, the number of documents across different subject matter areas continues to grow. For example, with the advent of the Internet and the World Wide Web (WWW), the documents on the different web sites on the Internet continues to expand as the number of networks and servers connected thereto continue to increase on a global scale. Accordingly, the fields of information retrieval, document summarization, information filtering and/or routing as well as topic tracking and/or detection systems continue to grow in order to track and service the vast amount of information.
The determination of the similarities and differences between various text passages plays an important role in such processes. Different approaches are currently being employed in determining similarities and differences between various text passages. Typically, such approaches determine the keywords for the documents being compared and employ a mathematical technique, such as logarithmic frequency, to determine the similarity or differences between the documents. However, such mathematical techniques at times require high dimension vector manipulation in order to determine the similarity between the documents.