In the past few decades, information technology (IT) has been developing very rapidly and has changed the way of storing and managing files and documents. Nowadays, more and more files and documents are stored in electronic form. These electronic documentations are possible to be stored in the electronic database and are searchable by using computerized searching technologies. As more and more searchable electronic documentations are available on either a local machine or on a remote machine within the local area network (LAN) or over the Internet, the quality of search results becomes more and more important to help the searchers find the right information they want.
The following documents pertain to web and database searching and results ranking techniques:
U.S. Patent Documents:
6,285,999September 2001 Page6,560,600May 2003 Broder6,871,202March 2005 Broder
Other Publications:                Michael W. Berry, et al, “Understanding Search Engines: Mathematical Modeling and Text Retrieval,” 2005        John Battelle, “The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture,” 2005.        S. Brin, et al, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” http://www-db.stanford.edu/˜backrub/google.html, Stanford University, 1999.        L. Barlow “How To Use Web Search Engines—Tips on using internet search sites like Google, alltheweb, and Yahoo.—Page 4—How Search Engines Work,” http://www.monash.com/spidap4.html The Spider's Apprentice, Monash Information Services, 2004.        Fluid Dynamics Software Corporation, “Sorting Results: How Relevance is Calculated,” http://www.xav.com/scripts/search/help/1074.html, August 2003.        Webconcerns, “ASP .Net Scripts—site crawler, indexer and search engine (page 1 of 5),” http://www.webconcerns.co.uk/aspnet/searchdb/default.asp, September 2005.        K-Praxis, “Emerging Face of Information Search Part 2: Relevance Ranking of Results,” http://www.k-praxis.com/archives/000111.html, July 2004.        L. Zeltser et al “High Precision Information Retrieval with Natural Language Processing Techniques,” http://www.zeltser.com/info-retrieval/, 1997        T. Viall, untitled, http://www.ri.gov/downloads/search_wp.doc, State of Rhode Island, June 2005.        Greg R. Notess, “Unusual Power Web Searching Commands,” http://www.infotoday.com/online/nov03/OnTheNet.shtml Online, Vol. 27 No. 6, November/December 2003.        
Many Internet search engines, such as Google™, Yahoo!® and Microsoft, Inc.'s msn.com™ have tried to improve the quality of the search results. Google, Inc. for example, adopted its famous PageRank™ algorithm technology to help searchers find most popular and important web sites by ranking the pages. The rank of a page rated by the PageRank™ algorithm is defined recursively and depends on the number of PageRank™ metric of all pages that link to it by hyperlink. A hyperlink to a page counts as a vote of support. A page that is linked by many pages with high rank receives a high rank itself. The PageRank™ algorithm considers that the importance of a page is determined by the number and the rank of the pages that link to it.
However, the PageRank™ algorithm has two major disadvantages. The first one is that it favors old pages because a new page, even a very good one, will not have many links or citations unless it is part of an existing and high ranking site. Therefore, it does not treat all web sites and web pages equally. Secondly, in most cases, searchers do not care about how important or popular a web site is. They just want get the results that are most relevant to their search query. In the case of a desktop search, the PageRank™ algorithm would not work because unlike web pages, there are no back links available for most files stored in the local computer, such as resumes, letters and etc.
Relevancy is normally considered to be the appropriateness of a document to a searcher's need. One of the most common methods for researching relevancy searching and ranking is the vector space model. Using a vector space information retrieval (IR) model, a term-by-document matrix is constructed. The columns of the matrix are the document vectors and the rows of the matrix are considered the term vectors. The cosine of the angle between the query vector and the document vectors is commonly used to measure similarity for query matching. The vector space model has a significant advantage over traditional indexing methods for the searchers, because the retrieved target documents can be ranked, thus almost eliminating the no result in exact-match systems. However, because the vector space model starts with a term-by-document matrix, it inevitably losses the information of whether search terms are standalone or grouped together in the first place. Therefore, like other methods, it only processes the individual search terms and has no way to handle the cases when the search terms are grouped together in the target document.