1. Field of the Invention
This invention relates to the fields of classification, categorization, sorting and filtering of linked sites to a web document such as a search results page. The invention relates especially to systems and methods for visually enhanced presentation of linked sites and web objects using iconic characteristic representations.
2. Background of the Invention
The Internet and general purpose search engines are well known in the art. A “web surfer” may access a search engine, such as Yahoo! ™or Google™, using his or her browser device. The browser device may be any networked computing platform ranging from a personal computer equipped with a browser such as Netscape's Navigator™ or Microsoft's Internet Explorer™, to a portable device such as a personal digital assistant (“PDA”) equipped with a wireless network interface and a microbrowser program. Wireless telephones such as Personal Communications Systems (“PCS”) telephones as well as some television set-top devices (e.g. WebTV) may be used to browse the Internet, as well.
Generally, these search engines work on one of a few premises. Initially, the operator or owner of a web site server submits the new site to each search engine for indexing, “spidering” or “crawling”. The submitter may also indicate certain keywords, descriptive phrases, or categories which the submitter believes is appropriate for the site content. In some cases, the search engine's indexing operation is completely automatic, and the web site is added to the engine's categories and keyword lists as suggested and as determined by analysis of the content of the submitted site (e.g. word frequency analysis, hyper text header tags, etc.). Some other search engines provide for manual review and categorization of the site, as well.
Periodically, a web site server operator may resubmit the site for indexing, spidering or crawling to include updates and additions to the site. Some sites also provide for periodically reviewing the site content without initiation or submission by the site operator.
The result, then, is an index maintained by the search engine which is more or less current as to the content of each site which has been submitted to the search engine. When a web surfer accesses the search engine, he or she may search by keywords, phrases, or categories. For example, a surfer may search for sites based on the keywords “childhood” and “medical treatments”, or a search may be made through a hierarchical arrangement of categories, as shown in Table 1. Each indexed web site may appear in one or more categories, and may be included in the results of a many different sets and combinations of keyword searches.
TABLE 1Example Subject Hierarchy for Search Engine Index> Health and Personal Care> Exercise and Fitness> Aerobic Exercise> Muscle and Body Building...> Medical, Illness and Disease> Adult Medical> Youth Medical> Geriatrics> Women's Heath> Men's Health......
If a web surfer performs a keyword search, such as looking for sites containing the keywords “childhood” and “medical treatments”, a results “page” is provided by the search engine to the web surfer which typically includes a short description of each site (or the first few words for the site's main page), a hyperlink to each site's web server, and a relevance ranking, as shown in Table 2.
TABLE 2Example Keyword Search ResultsResults of keyword search: childhood, medical treatments(1)“American Pediatrics Organization” - A web site for professionalcollaboration regarding pediatric practices, research . . .www.apo-sample.org (cached) 98%(2)Help for new parents - A survival guide to childhood diseases, colds,and infections... www.kidkolds.org 94%(3)Doctor speak made easy - What did my doctor actually say about mychild's condition? Plain-english medical terminology . . .www.plain-meds4kids.com (cached) 82%(4)Discount Children's Medical Source - online, virtual community ofparents united to purchase health goods at a discount . . .www.DCMS-Group.com (cached) 78%(5)Prescription and surgical alternatives for childhood ailments -holistic resources for parents . . . www.holistic-youth-meds.org 52%
To provide this search results page to the web surfer, the search engine is equipped with a software means to dynamically generate a “page” containing the search results and links to the sites listed, using a page definition language which allows for embedding linking such as Hyper Text Markup Language (“HTML”) or Wireless Markup Language (“WML). Methods and systems to transmit such pages from a server to a browser client include Hyper Text Transfer Protocol (“HTTP”) and Wireless Application Protocol (“WAP”), among others. The methods and systems are well known in the art, and are readily available in software packages for web servers and search engines.
Some existing technologies attempt to enhance the perceptibility of these results pages using graphics, such as bar graphs or pie charts indicating the relevance factor, or small “thumbnail” images representing each linked web page. Other search sites provide a small icon next to each search result item which indicates if the linked site has available images, such as the United States Patent and Trademark Office's patent database search engine. While these methods do provide some increase in the comprehensibility of the results page, they generally do not represent all of the site factors which may be of interest or dislike to the web surfer.
For example, a particular web surfer using a wireless device (e.g. cell phone or PDA) may not be interested in “visiting” sites which are slow to load. Another web surfer may not want to visit sites which spawn multiple windows or browser frames such as sites which use Java Script, or which use cookies or other forms of anonymous session tracking.
Yet another web surfer may not want to waste time clicking through to sites which require a subscription and/or login and password. Other web surfers may prefer sites which have been “quality checked” or verified by third parties for privacy and security, or prefer sites which are operated by government and institutional entities.
Much of these characteristics could be determined by analysis of each site contents, such as Java Script code to spawn browser frames, HTML which deposits or retrieves cookies, HTML which provides login forms, links to graphics “seals” of approval of third party sites, and keywords which indicate adult or family-friendly subject matter. Other characteristics may be determined manually, such as by search engine administrator review or by cooperative surfer feedback (e.g. a surfer who visits a linked site may provide feedback to the search engine that it is slow, or that it is a “spoof” of a family-friendly site which actually contains adult material).
However, to date, no technology is available to allow characterization and presentation in such ways of linked web sites. Therefore, there is a need in the art for a method and system for characterizing linked site content, providing a readily comprehensible summary of such site characterization in a list of linked sites, and providing for additional filtering and sorting of such a list based upon user preferences.