The present invention concerns a method of recording information relating to a document visited by a user of a computer communication network.
It also concerns a method of searching for a document on a computer communication network from information recorded by means of the recording method according to the invention.
Correlatively, the present invention concerns a recording device and a search device adapted respectively to implement the recording method and searching method according to the invention.
The present invention fits more generally within the field of computer communication networks which make it possible to transfer documents between computer servers storing electronic documents, and one or more users able to surf the network by means of a browser.
In communication networks, a multitude of computers and peripherals are connected. The peripherals can, by way of example, be printers, storage units, or means of acquiring or storing documents. The computers and peripherals in a network can in turn be computer servers or clients on the communication network.
The documents exchanged are of very varied natures: texts, images, videos, sound, computer programs, etc.
Given the size and complexity of a wide area network, the user cannot surf it completely in order to seek information. This is notably the case with the World Wide Web, built on top of the Internet.
Search tools have been set up to facilitate this search. They generally make it possible, using an indexing of the documents stored, to make searches by key words. However, the results are very often so large in number that they make it very difficult to use them.
In addition, when the user has in the past found a document liable to meet the object of his search, it will be highly advantageous to him to attempt to find this document again using the history of his browser.
This is because the history of a browser can contain information such as the title of documents visited and their electronic address on the communication network.
However, when the history contains many entries, the search is tedious, all the entries having to be examined one after another. In addition, the title stored can be deceptive with regard to the exact content of the document. Moreover, the storage of the entries in the history is limited in time in order to limit the space needed for storing the history.
The aim of the present invention is to propose a method of recording information which makes it possible to store, in reduced form, documents visited by the user, and an associated search method which then enables the user to find a document visited in the past.
In accordance with the invention, a method of recording information relating to a document visited by a user of a computer communication network is characterised in that it includes the following steps:
extracting key words associated with said visited document;
associating a binary code with each extracted key word;
storing said associations in a dictionary; and
storing said binary codes associated with the electronic address of the document on the computer communication network in information storage means of the user.
Correlatively, the present invention also concerns a device for the recording of information relating to a document visited by a user of a computer communication network, characterised in that it has:
means of extracting key words associated with said visited document;
means of associating a binary code with each extracted key word;
a dictionary for storing said associations; and
information storage means adapted to store said binary codes associated with the electronic address of the document on the computer communication network.
Thus, by reducing each document visited to a certain number of key words, compressing these by means of a binary coding and storing the result of this compression, it is possible to store locally, in reduced form, a very large number of documents visited by the user.
According to a preferred characteristic of the invention, the step of associating a binary code with a key word comprises the following substeps:
checking the existence or not of said key word in the dictionary;
in the negative, creating a new binary code; or
in the affirmative, reading the binary code associated with said key word in the dictionary.
Generating the dictionary as new key words are extracted from documents visited by the user makes it possible to create a dictionary peculiar to each user and to limit the size thereof solely to the key words extracted locally.
According to an advantageous characteristic of the invention, particularly simple to implement, the binary codes of the dictionary are fixed-length codes.
Alternatively, the binary codes are variable-length codes, thus making it possible take account of the frequency of appearance of a key word when it is coded in order to limit still further the space necessary for storing the binary codes in the information storage means.
According to a preferred characteristic of the invention, the binary codes have a length of M bits determined according to a maximum number 2M of associations stored in the dictionary and, at the step of creating a new binary code, if the number of associations stored in the dictionary is greater than said maximum number 2M, the binary codes of the dictionary are reconstructed on binary codes of length M+1.
The size of the binary codes is thus adapted in real time to the increasing number of key words which have to be stored in the dictionary associated with the user.
Preferably, in order to limit still further the space necessary for storage of the dictionary, the associations of key words and binary codes stored in the dictionary are compressed by an entropic coding method.
According to another preferred version of the invention, the information storage means are incorporated in the history of a browser of the user.
Thus it suffices to add a supplementary field to the existing history in order to store the binary codes associated with the key words of each document.
This arrangement affords a saving in space, avoiding notably storing in independent information storage means the electronic addresses of the visited documents already stored conventionally in the history of the browser of the user.
According to a preferred embodiment of the invention, the recording method also comprises a step of storing, in the information storage means, an authentication signature associated with the document.
The storage of this authentication signature, obtained for example by means of a Cyclical Redundancy Check CRC algorithm, makes it possible to check subsequently whether the content of a document at a given electronic address has or has not been modified.
Thus, still according to this preferred embodiment, the recording method also includes the following prior steps:
checking the existence or not of the electronic address of the document visited in the information storage means of the user;
in the affirmative, calculating the authentication signature associated with the document visited;
comparing the calculated authentication signature and the stored authentication signature in the information storage means; and
reiterating the steps of extracting key words, associating a binary code, storing said associations, storing said binary codes and storing the calculated authentication signature in the information storage means of the user when the calculated and stored authentication signatures are different.
Thus, each time the user once again visits a given document, the different steps of the recording method are implemented only if the content of this document has been modified since the last storage of its electronic address associated with a certain number of key words in the information storage means of the user.
According to another preferred characteristic of the invention, the step of extracting the key words comprises the following steps:
determining the format of the document;
eliminating, in said document, one or more commands from a list of commands to be eliminated for a given format;
determining the language of the document;
eliminating, in said document, a series of common words using a list of common words to be eliminated for a given language;
eliminating, in the document, a series of terminations from a list of terminations to be eliminated for a given language;
making uniform the format of writing the words of the document; and
eliminating doubles in said document.
This extraction step makes it possible to condense a document, such as text, into a series of key words which are significant with respect to the content of the document.
According to another advantageous characteristic of the invention, which then facilitates a search using binary codes on the stored documents, the recording method also includes a step of indexing the electronic addresses of the documents by means of binary codes in the information storage means of the user.
The recording device according to the invention has characteristics and advantages identical to those previously disclosed for the recording method, these advantages not being disclosed again here.
The present invention also concerns a method of searching, by a user, for a document on the computer communication network from information recorded by the recording method according to the invention, characterised in that it comprises the following steps:
supplying a search criterion containing at least one key word by the user;
reading, in the dictionary, the binary code associated with the key word if such exists;
extracting, from the information storage means, the electronic address of the document or documents associated with the binary code read;
downloading the document or documents if such exist.
Correlatively, a device for the seeking, by a user, of a document on the computer communication network using information recorded by a recording device according to the invention, is characterised in that it has:
means of supplying a search criterion containing at least one key word by the user;
means of reading, in the dictionary, the binary code associated with the key word;
means of extracting, from the information storage means, the computer address of the document or documents associated with the binary code read;
means of downloading the document or documents.
This search method thus enables the user to find once again, amongst the documents to which he has already gained access, those which are able to meet certain criteria, such as, for example, the presence of key words.
The user can thus easily find again a document already displayed using the information which it contains rather than only from a title or an electronic address, which do not directly give information on the content of the document.
In addition, this search being performed locally, at the level of each user of the communication network, it does not depend on the load on the communication network nor the rate of the communication links.
The search is also carried out on a lesser number of documents than those stored on the communication network and the results can thus be obtained more rapidly.
According to a preferred characteristic of the invention, the search criterion containing several key words, the search method also includes a step of filtering the extracted electronic address or addresses, including the following substeps:
comparing the number of binary codes read associated with the extracted electronic address or addresses with a threshold value; and
eliminating the electronic address or addresses associated with a number of binary codes lower than said threshold value.
This filtering step makes it possible to limit the number of documents downloaded to the documents associated with a minimum number of key words of the search criterion.
According to a preferred embodiment of the invention, when the recording method comprises a step of storing an authentication signature associated with the document, the search method also comprises a step of updating the information storage means comprising the following substeps:
eliminating, in the information storage means, the document or documents which no longer exist at the associated electronic address;
calculating the authentication signature of the downloaded document or documents;
comparing the calculated authentication signature and the stored authentication signature in the information storage means; and
reiterating the steps of extracting key words, associating a binary code, storing said associations, storing said binary codes and storing the calculated authentication signature of said recording method when the calculated and stored authentication signatures are different.
This updating makes it possible to eliminate, in the information storage means, the documents no longer existing on the communication network and to update the binary codes associated with the documents when the content of the latter has been modified.
According to one advantageous characteristic of the invention, which makes it possible to search effectively for documents dealing with a subject, the search criterion comprises a regular expression of key words.
The search device according to the invention has characteristics and advantages identical to those previously disclosed for the search method, these advantages not being disclosed again here.
In a practical manner, the recording device according to the invention is incorporated in a microprocessor, a read only memory containing a program for recording information and a random access memory containing registers adapted to record variables modified during the running of the program.
Likewise, the search device according to the invention is incorporated in a microprocessor, a read only memory containing a program for searching for documents and a random access memory containing registers adapted to record variables modified during the running of the program.
An information storage means, possibly partially or totally removable, which can be read by a computer or microprocessor storing instructions of a computer program, is characterised in that it is adapted to implement a recording method and/or a search method in accordance with the invention.
The present invention also concerns a computer, a computer server and a communication network having means adapted to implement the recording method or the search method according to the invention.
Correlatively, the present invention also concerns a computer, a computer server and a communication network having a recording device or a search device according to the invention.
It also concerns a computer communication network characterised in that it has several computer servers according to the invention, and notably forming a wide area network.
The characteristics and advantages of this information storage means, of this computer, of this computer server and of this communication network being identical to those of the recording and search methods and devices described above, they will not be detailed any further below.
Other particularities and advantages of the invention will also emerge from the following description of a particular embodiment of the invention.