1. Field of the Invention
The present invention generally relates to online search technologies and document summarizations. More specifically, the present invention relates to a method and apparatus for efficiently processing search results obtained in response to a user query.
2. Description of the Related Art
An important use of computers is the transfer of information over a network. Currently, the largest computer network in existence is the Internet, which, as is well known, is a worldwide interconnection of computer networks that communicate using a common protocol. Millions of computers, from low end personal computers to high end super computers, are connected to the Internet.
In the late 1980s, a new type of information system, known as the World Wide Web (“the Web”) was introduced to the Internet. As is well known, the Web is a wide-area hypermedia information retrieval system aimed to give wide access to a large universe of documents.
The architecture of the Web follows a conventional client-server model. The terms “client” and “server” refer to a computer's general role as a requester of data (i.e, the client) or a provider of data (i.e., the server). In the Web environment, Web browsers are clients and Web documents reside on servers. Web clients and Web servers communicate using a protocol called “Hypertext Transfer Protocol” (HTTP). A browser opens a connection to a server and initiates a request for a document. The server delivers the requested document, typically in the form of a text document coded in a standard Hypertext Markup Language (HTML) format.
Portions of documents displayed on the Web may contain hypertext links. The hypertext links link graphics or text on one document with another document on the Web. Each hypertext link is associated with a Universal Resource Locator (URL). A URL specifies a server and a particular document on that server. When a user selects a hypertext link, using, for instance, a cursor, the browser connects to the server and retrieves the document(s) specified by the URL(s).
Some servers provide a means for searching a collection of documents. Upon initial request, the server supplies a form to the browser. The user, using the browser, enters data such as keywords on this form as part of a search query and then opens a new connection to the server and submits this data to the server. The server responds to this request with a new document listing, some or all of the documents matching those key words or other data requested by the browser. Each listed document normally includes a hypertext link to the actual document so that the user may easily retrieve that document.
Today, finding information as easily and quickly as possible has become a crucial problem. The World Wide Web contains millions of documents spread over hundreds of thousands of computers throughout the world. Although hypertext links tie all these documents together, the distributed architecture of the Web produces an incoherent system that often makes it very difficult for users to locate documents of interest.
Search engines have become more and more important with the continuous growth of information in order to find and retrieve information from a large repository such as the Internet and databases. As is well known, current search technology is usually based on an electronic search form, where the user enters keywords to form a query. As discussed above, the query is submitted to the search engine, which in turn presents links to the matching resources in the repository, a document title, or possible summary information in the form of a short abstract of the original document. This abstract may be generated automatically and may contain the essence of the document. The user must then determine the relevance or importance of a document by reviewing the title and/or the abstract of the document presented in the result page of the search.
The larger the result set, the longer it takes the user to review the document titles and abstracts of the search results. Research has shown that a typical user will only carefully review the first five to ten result summaries for a particular search. However, search results may contain several hundred or several thousands of hits. Techniques, such as Boolean query language, may be used to limit and narrow down the number of hits.
A result set of ten to twenty hits may still take considerable time and effort to review because of the time required for reading the title and abstract. To really ensure whether a document is an ideal match to the search query, a user still has to open (i.e., view) a document. This means, however, that by clicking on a hyperlink (URL) and accessing a document resource with a web browser, the document content must be downloaded from the server to the client before viewing. It may take a considerable amount of time to access the document which therefore slows down the whole process. After downloading a document from the server, the user may then determine that the downloaded document is not a good match for the original search query. The user may then continue to read through the rest of the original result page and skim other abstracts looking for a more promising document. As a result of this process, a user typically has to download several documents until there is a good match for the original search query.
Documents with large amounts of text data may be rendered and then resized in order to create a visual abstract (also known as a thumbnail). As is well known to one skilled in the art, rendering means to process a document for representation. For example, an HTML document includes data and format instructions (i.e., tags). The format instructions need to be rendered before it can be displayed in its intended way. Rendering is typically done with a web browser such as Netscape Navigator or MS Internet Explorer. The rendering engine of the web browser essentially processes format instructions and converts them into graphical elements, determines the layout and calculates the overall appearance of the document.
However, after rendering and resizing the original body of text of the abstracts may not be readable because the font is too small. Moreover, with today's standard screen resolution, it may not be possible to produce a readable font in this size. It would be helpful for the user to read the headings or title and be able to determine whether a document is desirable for further reading. However, resizing algorithms use proportional resizing. The body text, which cannot be displayed at this size, will be reduced to the same size as the heading. It would be helpful to resize the body text and use this additional space to enlarge the headings and titles so that the user can read them.