1. Field of the Invention
The present invention relates to an apparatus, method and computer program product for extracting a structured document accessible via a network.
2. Description of the Related Art
Conventionally, technologies for judgment on positive or negative (p/n) of a document present on a Web and extraction of a hot topic from the document are known. For example, in “Main Topic Extraction in a Blog Space”, a study group material of the Japan Society of Artificial Intelligence SIG-KBS-A501-02, pp. 5-10, 2005, Kazumi Saito and three others, a technology for obtaining a large-scale document stream from blogs, electronic mails, news, and the like on the Internet is disclosed. Further, for example, in JP-A 2005-182803 (KOKAI), a technology for generating an information digest by extracting predetermined information from a document is disclosed.
In the conventional document extraction, object sentences are often corpuses prepared in advance. A situation in which a user encounters various opinions while browsing the Web is not assumed. However, actually, it is considered that, for example, in opinions in a blog, opinions attached with approvals by a large number of track-backs and opinions attached with no track-back affect psychology of users differently.
Moreover, it is considered that, even if a large number of links are attached to opinions, the opinions affects psychology of users differently depending on time when the links are attached, for example, one year ago or today. Provision of a document extraction technology taking into account such information is desired.