1. Field of the Invention
The present invention relates to a document searching apparatus for searching a group of a huge number of document files stored in an information processing device for a desired file based on the content of the document, the link relation of the document, the storage location of the document and so on, and also relates to a method thereof and a record medium thereof.
2. Description of the Related Art
As the computer networks have progressed, a huge amount of online document information (web page) has emerged. To search and organize such a huge amount of online document information, an indexing service for the information is known.
For example, as an Internet web page searching service, a directory service is known. In the directory service, links of web pages are hierarchically categorized and listed. The service has the following advantages:    Only by selecting (clicking) a category, links of web pages that the user wants to browse can be obtained.    Since web pages are categorized, unnecessary information is not searched.    Since web pages are manually categorized, irrelevant information can be suppressed from being mixed with relevant information.
With such advantages, the service has been very widely used on the Internet. However, such a service requires a manual work for categorizing and managing web pages. Thus, the operation cost becomes high.
To automatically maintain the entire directory service, the following problems should be solved.    Important documents should be selected.    Category hierarchy should be managed (for example, topics should be added and deleted time by time).    Documents should be automatically categorized.
Next, the selecting operation of important documents will be described. On the Internet and an intranet, web pages are drastically increasing time by time. Thus, pages of similar information are created by different people everywhere. Thus, even if web pages are searched for desired information using a keyword, a very large number of pages are hit. Thus, the user does not know what information is important in a huge number of web pages as the search results. To solve such a problem, the following methods are available.    Search results are sorted in the order that a search request is satisfied. In other words, search results are sorted and ranked based on the number of keywords or the like contained in web pages.    Search results are visualized for assisting accesses. In other words, documents as search results are grouped (clustered) based on the contents.    Search results are sorted based on attributes (such as size, date/time of creation, and so forth) of each document.    Search results are sorted in priority levels assigned by any means. For example, search results are sorted based on meta data such as a link relation, an analysis of a user's access log, or a rating assigned by a third party.
As a considerable example, a document importance assignment using a link relation of hypertext such as a web page is becoming an important technology on the research and service stages. The simplest representation of a link importance assignment corresponding to the link relation is based on an intuition of “the importance of a document that are linked from many documents is high”.
However to allow the user to easily navigate information, there is a tendency of which web pages stored in the same server are linked each other. For example, in personal web pages, there are many links to their top page such as “return to the top of XX”. Thus, by counting documents which refer to the document, when the document is in a server or a personal home page that contains a large number of documents, the importance of the document becomes high. In addition, when a malicious person know that a searching system detects the importance of documents based on the number of linked documents, he or she can meaninglessly separate pages or add pages that are meaninglessly linked to other documents so as to raise the importance of his or her web pages.
To deal with such a problem, in addition to the intuition of “the importance of a document that are linked from many documents is high”, other intuitions of “the importance of a document that is linked from an important document is high.” and “the importance of a page linked from a page that links to fewer pages becomes higher” are suggested in a web page that can be browsed at “http://www.elsevier.nl/cas/tree/store/comnet/free/www7/1921/com1921.htm”.
The second intuition is based on a discovery of “the importance of a web page guided by a famous directory service is higher than the importance of a web page guided by a non-famous personal link list”. The third intuition is based on a thought of “the importance of a document that is linked from a link list that is linked to 50 documents is higher than the importance of a document that is linked from a link list that is linked to 1000 documents”. In an importance determining algorithm based on those intuitions, to calculate an importance of a page A temporary importance is calculated using the number of other pages linking to the page A. The temporary importance is updated using the link relation. Such operations are repeated until converged.
However, in such an algorithm, a site that has a large number of pages is more advantageous than others because it is linked from many pages. Thus, when the importance of pages is calculated, pages in similar sites are sorted as important pages.
When the user searches web pages for desired data, he or she needs to have an interface for accessing a keyword for the desired data. As a related art reference of a keyword accessing interface, a Kana-Kanji converting interface is known.
For example, Japanese Patent Laid Open Publication No. 03-241456 discloses a technology of a Kana-Kanji converting interface using a touch-panel type device. According to the technology, after inputting the pronunciation characters of a keyword using a software keyboard on a screen, the user presses a “convert” key so that the input characters are converted into a regular Japanese character string that contain Kanji characters. Pronunciation characters is used as characters standing for a speech souund of a word.
In addition, Japanese Patent Laid Open Publication Nos. 10-154144 and 10-154033 and a web page that can be browsed at “http://www.csl.sony.co.jp/person/masui/POBox/index.htm” disclose a pen-type text inputting system. According to the technology, although the pronunciation characters of a keyword is input using a software keyboard on a screen, whenever a part of the pronunciation character is input, alternatives of Kanji characters are output based on a user's character input history.
In addition, according to the above-described related art references of Japanese Patent Laid Open Publication Nos. 03-241456, 10-154144, and 10-154033 and the web page, to perform a Kana-Kanji converting operation, since the pronunciation characters (spelling) of a keyword should be input character by character, the user should sometimes input a long character string.
Moreover, an interface for inputting obvious pronunciation characters is known. As an example of such an interface, keyword lists for individual initial characters such as “(a)”, “(i)”, and so forth are created. On the keyword lists, the user selects a desired one. However, in the example, when there are many keywords of a list starting with a particular pronunciation character, it is difficult for the user to select a particular keyword from the keyword list. An example of such an interface is an automatic transfer machine used in a bank.
In another example of the obvious pronunciation character input interface technology, when successively inputting pronunciation characters (or clicking them with a pointing device) and they match character strings of keywords, keywords as regular character string containing Kanji characters appear. FIG. 1 shows a system of which pronunciation characters that are successively input match character strings of keywords, the input pronunciation characters are converted into a regular character string containing Kanji characters. FIG. 1 shows an example of which a character string “ (akihabara)” appears. Referring to FIG. 1, the user successively inputs the pronunciation characters using a list of 50-Kana characters. To cause the character string “(akihabara)” to appear on the screen, the user successively inputs pronunciation characters “ (a)”, “ (ki)”, “ (ha)”, “ (ba)”, and “ (ra)”. After all the pronunciation characters “ (akihabara)” have been input and they matches a keyword, a regular character string containing Kanji characters “(akihabara)” appears. However, in such a system, for a long keyword, the user should input many pronunciation characters.