The present invention is in the field of data processing systems and, in particular, to systems, methods and media for utilizing electronic document usage information with search engines.
Personal computer systems are well known in the art. They have attained widespread use for providing computer power to many segments of today's modern society. Personal computers (PCs) may be defined as a desktop, floor standing, or portable microcomputer that includes a system unit having a central processing unit (CPU) and associated volatile and non-volatile memory, including random access memory (RAM) and basic input/output system read only memory (BIOS ROM), a system monitor, a keyboard, one or more flexible diskette drives, a CD-ROM drive, a fixed disk storage drive (also known as a “hard drive”), a pointing device such as a mouse, and an optional network interface adapter. Examples of such personal computer systems are International Business Machine Corp.'s (IBM's) ThinkCentre™, ThinkPad™, Aptiva™, and IntelliStation™ series of personal computers. The use of mobile computing devices, such as notebook PCs, personal digital assistants (PDAs), tablet PCs, sophisticated wireless phones, etc., has also become widespread. Mobile computing devices typically exchange some functionality or performance when compared to traditional PCs in exchange for smaller size, portable power, and mobility.
The widespread use of PCs and mobile computing devices in various segments of society has resulted in a reliance on computer systems both at work and at home, such as for telecommuting, news, stock market information and trading, banking, shopping, shipping, communication in the form of hypertext transfer protocol (http) and e-mail, as well as other services. Many of these functions take advantage of the communication abilities offered by the Internet. Such connectivity has facilitated unprecedented amounts of collaboration and sharing of information between individuals, both within organizations and outside organizational structures. This collaboration has resulted in individuals having access to and sharing vast amounts of information, often in the form of electronic documents.
Electronic documents are digitized documents that contain text, graphics, photographs, etc., and can be read by various computer systems. Electronic documents may contain text or graphics, and a wide variety of file formats have been used, such as Portable Network Graphics (PNG), Joint Photographic Experts Group (JPEG), Graphics Interchange Format (GIF), Tag Image File Format (TIFF), Microsoft Word (DOC), etc. Other file formats capable of handling text and graphics include Hypertext Markup Language (HTML) and Adobe Systems Inc.'s Portable Document Format (PDF). For many applications, electronic documents, particularly PDF documents, have supplanted printed material for the dissemination of information, as many journals, newsletters, books, articles, etc., and now distributed either exclusively or non-exclusively in electronic form.
The vast amount of content, including electronic documents, available on public networks such as the Internet often makes it difficult for users to find useful and relevant information. Accordingly, many people utilize search engines to assist them in their search. Search engines are programs that search documents on a network for specified keywords and return to the requester a list of documents where the keywords were found. Typically, a search engine works by sending out a “spider” to fetch as many documents as possible, after which an “indexer” reads the documents and creates an index for the words contained in each document. Each search engine then typically creates indices using a proprietary algorithm so that meaningful results are returned for each query. Example publicly-available search engines include those provided by Microsoft Corporation, Google Inc., Yahoo! Inc., etc.
While search engines are quite powerful, they suffer from some flaws. First, search engines do not always identify the most relevant links early in the search results. Search engines also do not always identify the portion of a document most likely to satisfy the customer or requester, as they only identify, at best, the portion of the document in which the search words were found. This problem is exacerbated for larger documents, as a user may not know where to look in a very long document for the most relevant information. Additionally, search engines are often misled by the frequent appearance of keywords, such as when document developers attempt to mislead a search engine into giving a higher priority for a particular site or document by incorporating large numbers of keywords in the document in a process known as “keyword spamming”. Keyword spamming often results in erroneous or misleading query results, making the search engine less desirable for the user. Improving the performance of search engines will likely increase the usage of that search engine and thus the revenue generated from it.
There is, therefore, a need for an easy and effective system to improve the functionality of search engines, particularly when search engines are used to find information contained in portions of electronic documents.