The last decade has witnessed a huge surge in the amount of electronic data available. The growth in the Internet and conversion of traditional paper based documents into electronic formats has led to a vast amount of information being available on the World Wide Web and enterprise information sources. Extraction of useful information from the large amount of accessible data is a challenge for researchers.
Several search engines exist that facilitate searching of information from different information sources. For example, search engine Google™ is a commonly used search engine for obtaining information over the World Wide Web. When a user is looking for some specific document from the information sources, then he/she can use a combination of relevant keywords to obtain the desired result. Several techniques exist which optimize the search process to present the most relevant documents to the user. However, if a user is looking for some generic information on a particular entity, then he/she needs to browse through all the documents obtained as a result of a search conducted using the entity as the search query. Thereafter, the user can create a profile of the entity by browsing through the obtained results. For example, in order to create a profile for IBM® (® trademark of IBM Corporation in the U.S.A. or other countries or both), a user would give IBM as the search query. This search query will return a large number of documents from the information sources. These documents may relate to different aspects of the entity ‘IBM’. For example, they may relate to the products, services, employees, competitors, and the like of IBM. Also, there may not be a website, or an information source dedicated to ‘IBM’ that lists the various connotations of ‘IBM’. Profiling the entity ‘IBM’ would imply summarizing the various aspects of ‘IBM’ such as products, services, competitors and the like of ‘IBM’. In order to do this, the user will have to browse through each search result. This can be a very tedious task.
For example, the entity ‘IBM’ returned a total of 22,700,000 results when searched over the Internet through the search engine Google™. It is almost impossible for a user to read all these documents and profile the entity IBM. Moreover, it is very difficult for the user to structure the relevant documents in a manner that summarizes the various aspects of the entity ‘IBM’.
Certain patents exist that facilitate profiling of entities. Some of them are mentioned hereinafter.
US Patent Application Publication No. 2002/0024532, titled “Dynamic personalization method of creating personalized user profiles for searching a database of information”, discloses a method of profiling an entity. However, this method presents an index of choices representing content items stored in the information source, to the user. Each displayed choice of the index is associated with a set of related keywords representing categories for the content items stored in the information source. Hence, this method is not suitable for searching in non-indexed information sources.
US Patent Application Publication No. 2001/0013029, titled “Method of constructing and displaying an entity profile constructed utilizing input from entities other than the owner”, discloses a method of constructing a profile of users or clients or entities based on the electronic documents between two entities. However, in this method, the electronic documents need to be exchanged for the construction of the profile.
Therefore, in light of the drawbacks associated with the existing art, there is a need for a method and system for summarizing the various aspects of an entity automatically. Further there is a need for a method and system for constructing a profile of an entity, based on the information obtained from at least one information source.