1. Field of the Invention
The present invention relates to a document processing technique, and more particularly, relates to a method and system for expanding a document set as a search data source in the enterprise search field.
2. Description of the Related Art
Today's enterprises have increasing types of electronic documents and data information. How to utilize such information to help an enterprise's business development and strategy decisions has attracted great attention. Enterprise search technology provides an effective way to help enterprises process such ever increasing data information. However, not all data are suitable as a search data source in an enterprise search.
The traditional information source upon which the search of a general search engine is based is a mass information source. The search result is also mass data information. A great number of search results may not be desired by a user, and the enterprise search user can seldom obtain the desired information by eliminating noise from such mass data information. Thus, in the enterprise search field, for a particular business demand such as performing market analysis on an industry or determining an enterprise for investment, it is impossible to collect all the Web data for performing a search due to limitation of resources. However, as much relevant information as possible must be acquired.
With the expeditious increase of Internet based documents, the data source for an enterprise search must be constantly updated and extended. Thus, a significant challenge in the field of enterprise search technology is to effectively and automatically extend the search data source for enterprise search services to help an enterprise collect information useful to businesses from a mass of web data. This will help eliminate unnecessary “noisy” information to promote data source utility and save storage resources for the search data source.
In the related art, a user of an enterprise search service recommends relatively valuable documents which he or she obtained from an enterprise search service system. The documents were stored in an information memory device of the enterprise search service system and then became a public enterprise search data source. Further, a system administrator for the enterprise search service keeps a close eye on the change of web information at any moment and adds useful information to the enterprise search data source. However, the above manners of expanding search data source can not automatically implement expansion based on the existing documents in the data source for enterprise search, which is totally dependent on actions of users of the enterprise search service and the system administrator. This not only consumes time and energy but also has low efficiency in expanding the data source.