1. Field of the Invention
The present invention relates to an information processing method, an information processing apparatus, and a program.
2. Description of the Related Art
Along the popularization of the Internet, Web pages that are made public via the Internet have come to include diverse digital information. Such digital information includes both useful and useless information from the viewpoint of a user. Therefore, efforts have been made to develop a technique for automatically extracting desired information from Web pages.
For example, a technique called LR Wrapper is suggested in “Wrapper induction: efficiency and expressiveness”, by Nicholas Kushmerick (Artificial Intelligence, vol. 118, pp 15-68 (2000)) to extract desired information based on a positional relationship of tags included in a HyperText Markup Language (HTML) document. According to LR Wrapper, a template of a positional relationship between tags is stored in advance, and each of Web pages is matched against the template to extract desired information. However, the LR Wrapper has a disadvantage that, because the LR Wrapper performs matching over the entire Web page, unintended information could be extracted, when the page contains information about different areas. On the other hand, Japanese Patent Application Laid-Open Nos. 2007-279964 and 2004-70405 suggest a technique for segmenting a Web page into a plurality of blocks, and matching each of the blocks against keywords.