The present invention relates to a method of thematically classifying documents and intended in particular for setting up or updating thematic databases, in particular for a search engine.
The invention also relates to a module for thematically classifying documents, and to a search ine fitted with such a thematic classification module.
At present, two main computer tools are known for searching documents on a computer network such as the Internet, for example.
These tools are search engines and guides.
A search engine is a tool that serves to extract the words or terms that are most representative of information, mainly in the form of text, and to store them in a database, also known as an “index” base.
Such index bases are generally updated relatively frequently.
In response to a request made by a user, the same tool scans through the index bases in order to identify the terms which are most relevant relative to those of the request, and then to sort the information obtained in return.
The other technique for searching for documents on a computer network consists in using a guide. That tool proposes searches by category, with document pages being classified manually by researchers.
Those types of tool present various drawbacks.
Firstly, search engines do not propose classifying document pages by category. The pages provided in response to a request are not typified. Thus, ambiguous requests can give rise to a very wide variety of responses that are perceived by the user as noise.
In contrast, guides provide a user with responses that are typified, i.e. that relate to the same theme(s) as the request.
Another method described in document U.S. Pat. No. 5,625,767 enables thematic classification to be performed on the basis of a statistical analysis of the document. However, that method requires the documents to be manually classified beforehand.
The object of the invention is to mitigate the drawbacks of search engines and of guides.