The present invention relates to information retrieval technology, and more particularly, to a method and system of web search.
The World Wide Web, together with other resources available over the Internet, provide a mechanism by which users, using computers or other information access devices, can obtain large amounts of information about a wide variety of subjects from a large number of web sites. Generally, information provided by web sites is in the form of Web pages, generally in HTML (HyperText mark-up language) format, which is a text-based format that describes how the respective Web page is to be displayed by a computer, and provides textual information, typically in ASCII form, and graphical information generally in a compressed format such as “GIF” or “JPEG.” In addition, a Web page will typically have hypertext links to other Web pages which may be provided by the same site as the original Web page, as well as other Web pages which may be provided by other web sites.
The Internet has over ten billion Web pages, and is still rapidly growing. To find suitable information, there are at least two basic approaches: using a search engine or a search directory such as Yahoo®, LookSmart®, or Open Directory®. Search Directories are useful when browsing general topics, and search engines work well when searching for specific information. Results can be improved by spending time learning the advanced search features of several search tools (usually found on Help pages at each site).
Most search engines maintain huge databases of web sites that can be searched by entering some text, phrases or sentences in a text field of a web page. It is a full-text index that searches the entire HTML file. To index their databases, search engines rely on computer programs called “robots” or, more specifically, “spiders.” These programs “crawl” across the web by following links from site to site and indexing each site they visit. Each search engine uses its own set of criteria to decide what to include in its database. For example, some search engines index each page in a web site, while others index only the main page. Currently, one of the most famous search engines, Google, indexes over 3 billion web pages.
Almost all search engines do keyword searches against a database of Web pages, but various factors influence the results of each search, such as, size of the search engine's database, frequency of the database updates, search capability and design, and speed. Google® offers both simple and advanced search capabilities. Advanced searching allows the search to be limited by including or excluding desired words or phrases, and allows for language-specific requests. FIG. 1 illustrates a conventional search result screen. The search result contain hundreds to thousands of resulting items individually comprising a title with a URL linked to a particular Web page 101a or 101b, a short passage (e.g., an abstract or highlighted search keywords) 102a or 102b, a file size in bytes 103a or 103b, and others.
Such numerous resulting items are difficult to efficiently browse, thus, many ranking techniques have been introduced to move irrelevant items lower on the list. One of the main rules in a ranking algorithm involves the location and frequency of keywords on a Web page. Search engines typically predetermine whether the search keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. The search engine assumes that any page relevant to the topic will mention those words at or near the beginning thereof. Frequency is another major factor in how search engines determine relevancy. Most search engines analyze how often search keywords appear in relation to other words in a web page. Those with a higher frequency are often more relevant than other web pages.
Although the solution is feasible, several problems remain. Specifically, conventional ranking algorithms are based on the model of location and frequency of keywords to determine the order of resulting items without considering such important factors as user browsing behavior. It is contemplated that users often select one or more resulting items according to the displayed short passages. Therefore, a need exists for a different system and method of Web search.