Personal computer systems are well known in the art. They have attained widespread use for providing computer power to many segments of today's modem society. Personal computers (PCs) may be defined as a desktop, floor standing, or portable microcomputer that includes a system unit having a central processing unit (CPU) and associated volatile and non-volatile memory, including random access memory (RAM) and basic input/output system read only memory (BIOS ROM), a system monitor, a keyboard, one or more flexible diskette drives, a CD-ROM drive, a fixed disk storage drive (also known as a “hard drive”), a pointing device such as a mouse, and an optional network interface adapter. Examples of such personal computer systems are International Business Machine Corp.'s (IBM's) ThinkCentre™, ThinkPad™, Aptiva™, and IntelliStation™ series of personal computers. The use of mobile computing devices, such as notebook PCs, personal digital assistants (PDAs), tablet PCs, sophisticated wireless phones, etc., has also become widespread. Mobile computing devices typically exchange some functionality or performance when compared to traditional PCs in exchange for smaller size, portable power, and mobility.
The widespread use of PCs and mobile computing devices in various segments of society has resulted in a reliance on computer systems both at work and at home, such as for telecommuting, news, stock market information and trading, banking, shopping, shipping, communication in the form of hypertext transfer protocol (http) and e-mail, as well as other services. Many of these functions take advantage of the communication abilities offered by the Internet. Such connectivity has facilitated unprecedented amounts of collaboration and sharing of information between individuals, both within organizations and outside organizational structures. This collaboration has resulted in individuals having access to and sharing vast amounts of information, often in the form of electronic documents. Electronic documents are digitized documents that contain text, graphics, photographs, etc., and can be read by various computer systems. Electronic documents may be in any file format, such as Portable Network Graphics (PNG), Joint Photographic Experts Group (JPEG), Graphics Interchange Format (GIF), Tag Image File Format (TIFF), Microsoft Word (DOC), Hypertext Markup Language (HTML), Adobe Systems Inc.'s Portable Document Format (PDF), etc. For many applications, electronic documents, particularly PDF documents, have supplanted printed material for the dissemination of information, as many journals, newsletters, books, articles, etc., are now distributed either exclusively or non-exclusively in electronic form.
While electronic documents possess improved qualities in many ways over hardcopies, such as cost, easy of distribution, and time to prepare, disadvantages of electronic documents when compared to paper documents do exist. One deficiency of electronic documents is that it is often difficult to find the most interesting or useful part of an electronic book or other document. With a paper book, individuals may observe which pages are the most worn or the pages to which the book naturally opens due to frequent reading of those pages. One can easily discern which book in, say, a library is the most useful based on its wear, and one can also often find the most useful part of the book by noting the wear caused by frequent reading. Because of the fixed nature of an electronic document, electronic documents fail to provide such indications of frequently-read or particularly useful sections of the document.
The vast amount of content, including electronic documents, available on public networks such as the Internet often makes it difficult for users to find useful and relevant information. Accordingly, many people utilize search engines to assist them in their search. Search engines are programs that search documents on a network for specified keywords and return to the requester a list of documents where the keywords were found. Typically, a search engine works by sending out a “spider” to fetch as many documents as possible, after which an “indexer” reads the documents and creates an index for the words contained in each document. Each search engine then typically creates indices using a proprietary algorithm so that meaningful results are returned for each query. Example publicly-available search engines include those provided by Microsoft Corporation, Google Inc., Yahoo! Inc., etc.
While search engines are quite powerful, they suffer from some flaws. First, search engines do not always identify the most relevant links early in the search results, forcing the user to spend time reviewing multiple results to find the information they are seeking. Additionally, search engines do not always identify the portion of a document most likely to satisfy the customer or requester, as they only identify, at best, the portion of the document in which the search words were found. This problem is exacerbated for larger documents, as a user may not know where to look in a very long document for the most relevant information. Additionally, search engines are often misled by the frequent appearance of keywords, such as when document developers attempt to mislead a search engine into giving a higher priority for a particular site or document by incorporating large numbers of keywords in the document in a process known as “keyword spamming”. Keyword spamming often results in erroneous or misleading query results, making the search engine less desirable for the user. Improving the performance of search engines will likely increase the usage of that search engine and thus the revenue generated from it.
There is, therefore, a need for an effective system to improve the functionality of search engines, particularly when search engines are used to find information contained in portions of electronic documents. In particular, there is a need to find information relevant to a user contained in portions of electronic documents.