The present invention relates to an electronic file retrieval method and system and more particularly to a method and system which allows a computer system user to retrieve previously processed files which are relevant to a file which is currently being considered by the user.
Given the recent dramatic increases in the computer memory available to users of Personal Computers (PCs), users now have the possibility to create massive personal document archives. Added to this is the possibility of retrieving documents over the World Wide Web (WWW) which provides an almost unlimited source of information. Whilst these developments greatly increase the knowledge available to a PC user, it is not often easy for the user to locate information which is relevant to a current task.
Sophisticated search engines have been developed to enable WWW users to xe2x80x9csurfxe2x80x9d the WWW. These engines, for example AltaVista(trademark), generally operate by exhaustively extracting words from Web pages published on the WWW. Links to these pages are then added to a database alongside the corresponding word entries. Search algorithms have also been developed for searching documents stored, for example, on the hard disc drive of a PC. Again these tend to conduct an exhaustive search of the stored documents for a user defined keyword.
It is an object of the present invention to provide a method and system which is able to identify electronic documents or files which are relevant to another electronic document downloaded from a data network, and to incorporate direct or indirect links to those relevant documents or files into the downloaded document.
According to a first aspect of the present invention there is provided a method of operating a computer system, the computer system being connected to a data network and comprising a display and a database storing a set of documents and/or document identifiers, the method comprising:
downloading an electronic document into the computer system over the data network, the document being in the form of computer readable code;
identifying keywords in the downloaded document;
modifying said computer readable code to introduce thereinto hyper-links to enable a user to link to documents stored or identified in said database and containing at least one of said keywords; and
displaying the downloaded document on the computer system display, where the introduced hyper-links appear as user selectable items.
Preferably, said step of identifying keywords comprises defining a global keyword list on the basis of the stored and/or identified documents and the downloaded document, and identifying those of the global keywords present in the downloaded document. More preferably, the global keyword list is defined by analysing word occurrence rate distribution over the documents.
Preferably, the said step of modifying the computer readable code comprises creating a hyper-link for each identified keyword. More preferably, the method comprises the further steps of:
activating one of said introduced hyper-links;
determining which of the stored and/or identified documents contain the associated keyword;
determining a similarity/dissimilarity coefficient for each of these documents; and
displaying a list of these documents together with the similarity/dissimilarity coefficient.
The displayed list preferably includes hyper-links to the listed documents.
Preferably, hyper-links are displayed by highlighting, e.g. using a colour change, underlining, or italicizing, the keywords identified in the downloaded document. Links may also be displayed as specific characters, e.g. xe2x80x9c.xe2x80x9d, xe2x80x9c!xe2x80x9d, xe2x80x9c?xe2x80x9d.
Preferably, said computer readable code is Hyper Text Markup Language (HTML), in which case said steps of downloading and displaying may be performed by a Web browser.
Preferably, the data network over which the electronic document is downloaded is the World Wide Web. The step of displaying the downloaded document (with added links) may involve interpreting the document with an Internet Browser.
The identifier may be a document title, a computer drive path, or Universal Resource Locator (URL) to an Web page, or a combination of these. The documents used to construct the database may include Web pages, word processed documents, and electronic mail items.
According to a second aspect of the present invention there is provided a programmed computer system comprising;
communication means coupled to a data network for downloading an electronic document over the data network, the document being in the form of a computer readable code;
an electronic database storing a set of documents and/or document identifiers;
first processing means arranged in use to identify keywords in said downloaded document;
second processing means arranged in use to modify said computer readable code to introduce thereinto hyper-links to enable a user to link to documents stored or identified in said database and containing at least one of said keywords; and
a display and display driver means arranged in use to display the downloaded document on the computer system display in a form where the introduced hyper-links appear as user selectable items.
In certain embodiments of the above second aspect of the present invention, the computer system is provided by a suitably programmed computer, where the communication means is a data modem of the computer, and the first and second processing means comprises a microprocessor or a digital signal processor.
In other embodiments of the invention the system comprises a personal computer connected to a local area network, which is coupled to the WWW via a router. Said database and said first and second processing means may be provided in said personal computer or in a second computer also connected to the local area network and accessible by other personal computers. Alternatively, the database and first and second processing means may be replicated in the personal computer and in one or more other computers of the local area network, to provide a hierarchy of xe2x80x9cknowledgexe2x80x9d servers.
According to a third aspect of the present invention there is provided a computer memory encoded with executable instructions representing a computer program for causing a computer system connected to a data network to:
construct an electronic database storing a set of documents and/or document identifiers;
download an electronic document into the computer system over the data network, the document being in the form of a computer readable code;
identify keywords in said document;
modify said computer readable code to introduce thereinto hyper-links to enable a user to link to documents stored or identified in said database and containing at least one of said keywords; and
display the downloaded document on the computer system display, where the introduced hyper-links appear as user selectable items.