1. Field of the Invention
The present invention relates to an apparatus, method and computer program product for checking web page links, and more particularly, to an apparatus, method and computer program product for detecting errors in hyperlinks and relationships between links and target web pages.
2. Description of the Related Art
In recent years, companies, organizations, and people have had many occasions to make the computerized information public on the Internet. Most of information published on these sites are hypertexts.
There is disclosed a first example of the conventional technology of hypertext link checking in nonpatent literature describing “LinkScan™” produced by Elsop™ (Electronic Software Publishing Corporation), available on the Elsop website, last searched on Oct. 9, 2002. This is a tool that automatically scans hypertext links and compiles logs of detected link errors. The disclosed link checker includes one type of the link checker adapted to diagnose a target online in accordance with the specified address of the target, and another type of link checker adapted to perform offline diagnosis of a website downloaded to a folder on a hard disk.
There is disclosed a second example of the conventional technology of detecting a physical mismatch in a link, in Japanese Non-examined Patent Publication No. 2001-273185. The method in the conventional technology comprises the steps of: storing an address of the link to be managed in a database; and checking whether there is a document at the stored address of the link or not, thereby making it possible to detect a physical mismatch in such as a dead link. The above conventional method further comprises the step of previously registering, on a system, a keyword and image for identifying each of documents in the database. In the conventional method, when the dead link is detected, it is possible to search for a vanished page by a search engine to then provide with a correction candidate.
There is a third example of the conventional technology of a typical system for checking a document including a document correcting system such as an auto-correcting function in Microsoft® Word produced by Microsoft Corporation. These document correcting systems are operable to detect an inappropriate expression and to then output a correction candidate.
A first problem to be solved is that, in the aforementioned first and second example of the conventional technologies, only a physical mismatched link can be detected, but a logically mismatched link can not be detected, because of the fact that, in the aforementioned conventional technologies, the judgment whether there is a mismatch or not is made based on only the result of the judgment whether an error is returned from a server or not, when the connection to an address of a link is gotten. The method of detecting a logically mismatch has no choice but to rely on manual and visual confirmation on a browser at present, because no error occurs in case of the logically mismatch.
A second problem to be solved is that, in the aforementioned first and second example of the conventional technologies, it is impossible to provide a correction candidate for the logically mismatch but it is possible to provide a correction candidate for only the physical mismatch. The reason for this problem is the similar to that of the above first problem.
A third problem to be solved is that the manual and visual confirmation on the browser needs enormous cost. The reason for this problem is that a large scale of site, such as of a company, has links of between thousand and tens of thousands, and the number of links between documents reaches to between tens of thousands and hundreds of thousand. The confirmation of whole of these links is not realistic about viewpoints of time and cost. The confirmation on the browser is also apt to omit to check a phantom link and the like.
A fourth problem to be solved is that, in the aforementioned third conventional technology, the logically mismatch, such as disunity in the hyperlink, cannot be detected causing confusion by the fact that the hyperlinks have different expressions for the links to the same documents. The reason of this problem is that a hyperlink having any appropriate syntax may be regarded as normal.