1. Field of the Invention
The present invention relates to a method and system for simplifying Web contents, and more particularly to a technique for simplifying the contents on the fly, even in the case of Web pages which do not have history information or whose URL (Uniform Resource Locator) changes day by day.
2. Background of the Invention
In recent years, the use of the Internet has become popular because of the progress of network technologies and improvement of functions of an information apparatus and the lowering cost trend. Since the detailed information transmission can be performed at a low cost without regard to corporations and individuals and further without conscious of borders, Web pages as a source of information transmission are increasing explosively day by day. Furthermore, vast amounts of information are updated under the control of administrators of Web pages. In this context, the Internet and Web pages utilizing the same are becoming an important information gathering media which takes the place of conventional broadcasts and mass media or which compensates for them.
By the way, the role of Web pages are diversifying. For example, without staying in a mere information transmission, business transactions (electronic commerce) via Web pages and collaborations using Web pages are being performed. In order to implement these diversified functions, there are provided Web pages which have a higher convenience. Also, in order to access the intended information more rapidly, there are incorporated functions in the Web pages which improve user operability of, for example, a search screen. Examples are a link list that is used in common in the site, an image map, or a form, etc. These are included in every page and provide functions that are very convenient for general users.
However, these general Web pages are designed on the premise of a desktop type of computer screen. That is, their layouts are considered in view of the size of a desktop computer screen. Hence, in case of a device with a small screen (hereinafter referred to as small screen device) such as PDA (Personal Digital Assistant) and cellular phones, or a software which reads aloud a Web page (hereinafter referred to as voice browser), there is a problem that one can not reach necessary information quickly. Namely, concerning the general Web pages, a form and image map are laid out at the top of the page, so it is necessary, in case of a small screen device, to repeat a display of these forms and others many times to reach the necessary information. Also in case of the voice browser, necessary information is read aloud after these forms and others have been read aloud. The small screen device generally does not need visual multi-functionality like a desktop computer, whereas the voice browser does not need visual functions for improving operability. On the contrary, these visual functions form an obstacle to the small screen device and voice browser.
Therefore, there is attempted a technique of simplification for omitting a part of Web pages, for example, the “Dharma Transcoding” technique as described by Masahiro Hori et al. or the “DiffWeb” (difference) technique.
The “Dharma transcoding” technique is a technique which divides an existing Web page into several pages in a condition similar to an original layout and to create a page that is easily displayed to a small screen device. This technique needs external annotation information which gives a detailed description of a structure of pages and significance of each part.
The “Diffweb” technique is a technique that caculates and presents a difference between a Web page that was registered in advance and saved and a current Web page. According to this technique, a list of pages can be registered per user and a difference of these pages can be calculated. With this difference technique, all of the processing such as page registration, storage, and difference operation is performed by a direction from users.
However, the “Dharma transcoding” technique needs the annotation information, as described above. To give the annotation information, there is needed interposition such as a volunteer, so that it is difficult to automate completely.
With the “DiffWeb” technique, page registration, storage, and difference operation are processed according to a direction from a user, as described above. Thus, the difference operation can not be performed as on-the-fly processing. Also, concerning the pull-down menu, it is feared that a character string as contents is deleted and the form after simplification can not work well.
Moreover, according to the prior techniques, the simplification is implemented by calculating a difference against a comparative page which has been saved in advance. Therefore, the following problems exist.
First, if the comparative page has not been saved in advance, the simplification can not be performed. That is, only a page that has a comparative page recorded can become a target for simplification, so that the page that appears first can not be subject to simplification.
Secondly, even if the comparative page has been saved, a page whose URL changes day by day can not be simplified. For example, an article page of the Asahi Shinbun (www.asahi.com) includes the date in the URL, as follows, i.e., “http://www.asahi.com/0530/news/business30010.html”. In this case, there is no past page that has the same URL, therefore, the simplification can not be performed.
Thirdly, even the necessary information might be deleted. For example, important information such as a title of link lists or a form might be deleted. On the contrary, unnecessary subtle changes in character strings might be saved.
It is therefore a feature of the present invention to provide a technique for the simplification of Web pages in order to access necessary information rapidly, when displaying or outputting Web pages using a small screen device or a voice browser.
It is another feature of the invention to provide a technique for performing the simplification of Web pages even if there is no past page of the same URL.
It is further feature of the invention to provide a technique for performing simplification of Web pages on the fly.
It is a still further feature of the invention to provide a technique for simplifying unnecessary information with high precision, without losing important information upon simplification of Web pages.