1. Field of the Invention
The present invention relates to a technology for generating search condition expressions of a system for searching information (files and emails etc.) by a computer or a system that makes a computer search information.
2. Description of the Related Art
There are two representative methods for generating search condition expressions (hereinafter referred to as “search queries” or simply “queries”) when searching for information.
Method 1: Query Generation by a User
In method 1, a user generates queries in consideration of keywords and attributes (hereinafter referred to as “metadata”) relating to the information in a search. The user then inputs the generated queries into the search system and the system searches for the information. This method is commonly used in web search services such as Google™ and MSN Search (MSN is TM) and in file search software such as Google Desktop Search and Windows Desktop Search.
Method 2: Automatic Query Generation
In method 2, a computer rather than a user generates queries automatically by extracting keywords and attributes. The keywords and attributes designated in queries are extracted by the computer and the computer automatically analyzes the information that the user is currently processing. The information currently being processed can be a document or a web page that the user is currently creating or browsing. The computer presents the generated queries to the user or automatically searches for the information (related information) relating to the information currently being processed by using the query.
FIG. 1 is a diagram showing an overview of an existing apparatus for automatically generating queries. As shown in FIG. 1, in the existing apparatus, a computer (PC) 101 on which a user performs a task comprises an information manipulation monitor unit 102, an information detection unit 103, a search feature information extraction unit 104, and a query generation unit 105. The information manipulation monitor unit 102 monitors information manipulations performed in the computer 101 by the user and detects the information that the user is handling. Note that, in FIG. 1, an information record unit 106 for storing information that users can manipulate is shown. The information detection unit 103 detects information currently being processed by the user on the basis of the detection result of the information manipulation monitor unit 102. The search feature information extraction unit 104 extracts search feature information from the information detected by the information detection unit 103. The query generation unit 105 generates queries by combining the search feature information extracted by the search feature information extraction unit 104. Using the generated queries, an information search is executed.
Additionally, Japanese Patent Application Publication No. 11-265378 describes a method for automatically extracting, from a document that a user is working on, information representing the features of the document (feature information) and for searching for related documents on the basis of this feature information. The feature information includes keywords in the document and the attributes of the document. Similarly, the Blinx search system is also a system for generating queries by extracting keywords from the context of the information that a user is currently working on (documents, emails, Web pages etc.) and for executing the search. Both of these methods generate queries relating to the information that a user is currently working on.
Japanese Patent Publication No. 3547069 describes a method for extracting a user's information need from the search condition expression that the user input. A computer compiles the user-input search condition expressions at a regular interval, and obtains the user's information need by calculating the frequency of the appearance of the search conditions etc.
In the conventional automatic query generation technology, a computer generates queries only from an information source (a document, a Web page etc.) being currently processed by a user. In this case, there is only one information source for generating the queries (a document being referred, an email being referred to or an email being composed etc.). If this information source includes all search conditions (keywords or attributions) corresponding to a user's information need, the computer can generate appropriate queries. If the information source does not contain sufficient information to satisfy a user's information need, it is probable that the computer will not be able to generate queries sufficient for searching the related information. If the search is conducted using the queries generated from insufficient information, the search result would include much information unrelated to the information need of the user (noise information).
When the user actually works on the task, the user often carries out the task with reference to various pieces of information related to the task. For example, when the user creates a document, the user might work on the task with reference to other documents, to Web pages, and to emails related to the task. In such a case, it is highly probable that the feature information of the task is present and is scattered in the plurality of information. However, since the conventional art focuses only on the information being processed by the user, the feature information included in the other information source in the task cannot be extracted. As a result, it is probable that the computer will not be able to collect sufficient feature information on the task, and the computer will not be able to generate appropriate queries to accurately search for the task-related information.
In addition, at the start of creating new information (documents or emails etc.) in the task, the information volume included in the information is small at first. Since the conventional art focuses on the information being processed, the queries for searching for the related information must be generated from the small volume of the information. Even if the other information has been used as reference information in the task or data is copied from other information, the information contained in such an information source cannot be used for query generation. Therefore, queries that sufficiently reflect the information need of the user cannot be generated because of the small information volume.