Technological advances in computer hardware, software and networking have lead to increased demand for electronic information exchange rather than through conventional techniques such as paper and telephone correspondence, for example. Such electronic communication can provide split-second, reliable data transfer between essentially any two locations throughout the world. Many enterprises and consumers are leveraging such technology to improve efficiency and decrease costs associated with traditional office environment. In other words, many offices are transitioning to a ‘paperless’ office structure rather than storing an overabundance of hardcopy paper files and documents.
As the amount of available electronic data grows, it becomes more important to store and/or utilize such data in a manageable manner that facilitates user-friendly and quick data searches and retrieval. For example, office productivity tools (e.g., word processing, spread sheets, presentation applications, electronic mail (email) applications, personal information manager (PIM) applications, etc.) can easily rise to a level of storage in the range of megabytes, gigabytes or even more.
The use of physical paper to store hardcopies of information has been noted to contribute to a declination of the environment as large amounts of trees are lumbered for paper creation. In addition to its ill affects on the environment, physical paper can be costly, unorganized, inefficient, and space-consuming. As many users and enterprises become more and more ‘paperless,’ organization of data becomes increasingly important. As well, locating a desired document within a vast storage device can also be difficult using traditional search mechanisms.
Conventional computer-based search, in general, is extremely text-centric in that search engines typically analyze content of alphanumeric search queries in order to return results. These traditional search engines merely parse an alphanumeric queries into ‘keywords’ and subsequently perform searches based upon a defined number of instances of each of the keywords in a reference.
In the Internet world, search engine agents, often referred to as spiders or crawlers, navigate websites in a methodical manner and retrieve information from available websites. For example, a crawler can make a copy of all or a portion of websites and related information. The search engine then analyzes the content captured by one or more crawlers to determine criterion by which to index a particular site. Some engines will index all words on a website while others may only index terms associated with particular tags such as such for example: title, header or metatag(s).
Nonetheless, conventional search engines merely consider literal ‘content,’ for example keyword occurrences, when indexing or locating results to a search query. In other words, conventional search engines do not perform any sophisticated analysis or interpretation but, rather merely use keywords, phrases, titles, headers, etc. to index a document.
Once indexes are generated, they typically are assigned a ranking with respect to occurrences of certain keywords, and stored in a database. An algorithm is often employed to evaluate the index for relevancy based upon the keyword ‘hit rate,’ for example, based upon frequency of words on a webpage, among other things. A distinctive factor in performance amongst conventional search engines is the ranking algorithm respectively employed.
Upon entry of one or more keywords into a search query, traditional search engines most often retrieve indexed information that matches the query from the database, generates a snippet of text associated with each of the matching sites and displays the results to a user. The user can thereafter scroll through a plurality of returned sites in connection with determining if the sites are related to interests of the user. However, this can be an extremely time-consuming and frustrating process as search engines often return a substantial number of sites. More often than not, the user is forced to further narrow the search iteratively by altering and/or adding keywords and operators to converge on websites that provide the sought after information.