The present invention relates to the field of information processing. More specifically, the present invention relates to a method and device for correcting the erroneous contents in web pages, an apparatus for providing web content correction service and the method thereof, and a computer program product.
With the development of information search technology, Internet search service providers like Google® and Yahoo® have provided web users with better and better search experience. Users can quickly find their interested information with a few clicks. Due to this convenience, users are becoming increasingly dependent on the web to obtain information, and at the same time, more institutions have been promoted to publish information on the web to attract users. Since the web has become one of the major sources for people to get knowledge, it is an important issue to improve the correctness of the contents published on the web so as to prevent the public from being misled by erroneous contents.
Generally speaking, there are two ways to publish information (contents) on the web: centralized content publication and distributed content publication. In centralized content publication, the website owner has full control over the web content, including the creation, update, and deletion of web content. Typical examples of centralized websites include Sohu, Sina, etc. Due to the limited knowledge of the website owner, the contents published in this way will likely contain errors. If these errors cannot be corrected in a timely manner, they will adversely affect the users who are browsing the web content.
In an existing method to deal with this problem, the website owner maintains a feedback channel for the users to report the erroneous content, and corrects the erroneous content on the web page according to the users' feedback. However, this method is not satisfactory due to the following drawbacks. First, there is normally a delay from when a user submits feedback until the website owner processes the feedback. During the delay period, the erroneous content remains uncorrected and thus the above-mentioned problem still exists. Second, whether to adopt the user's feedback to correct the web content depends on the website owner's judgment. The website owner may make a wrong decision due to the limit of his/her knowledge as stated previously. Third, if a user wants to correct the erroneous contents on different websites, he/she will need to submit feedback to the owners of each of these websites, respectively. This requires that the user be familiar with the feedback mechanisms of various websites, which may discourage the user from suggesting corrections to the errors. What is worse, many website owners never update the web contents after the publication thereof. For these websites, the users will have no way to correct the erroneous contents in the web pages.
On the other hand, in distributed content publication, the public has control over the creation, update, and deletion of the web contents. A typical example of distributed content publication is Wikipedia. If a user finds erroneous web content on the website, he/she can directly correct it. However, the centralized websites are much more than the distributed websites in number and will not be replaced by the latter in quite a long time. Therefore, it is impossible to use the error correction mechanism of the distributed websites to solve the problem of the centralized websites. Also, due to the limited knowledge of the users who correct the web content, the corrections he/she makes may also be erroneous. Moreover, for distributed websites, the same problem exists where the user is required to get familiar with the error correction mechanisms of different websites and thus inconvenience will be caused to the user.