An information retrieval system attempts to match user queries (i.e., the users statement of information needs) to locate information available to the system. The information available to the system may be stored in an internet environment, an intranet system, or proprietary databases. In recent years, data sources accessible via hyper-text transfer protocol (HTTP) have been rapidly added. Also the number of independent proprietary databases and proprietary databases associated with applications have drastically increased. For example, a company may have more than one databases, each of which may be used by different groups or departments within the company. Also, a company may use a database that is different from or unassociated with another database used in another company.
Due to the tremendous increase of web-based information and proprietary databases, and due to the fact that many of these databases are disconnected or unassociated with one another, a user may not be able to efficiently retrieve all the information he/she needs. Particularly, in large companies, one department may not know what another department is doing. One department may use applications that are otherwise disconnected or incompatible with applications used by the other department. One department may also have access to a database that is otherwise unknown to the other department. As such, an employee who is looking for a particular information may not realize that the information he/she is seeking is categorized as something else in different databases. For example, an employee looking for a list of work produced by a person having the name, “John” may not realize that the name of the person is identified as “author” in a first database, or as a “work creator” in another database. Without knowledge of data categories or classifications that exist in separate sources of data, the ability for an individual to effectively retrieve relevant information he/she needs may be compromised.
The same problem also exists among companies that use different proprietary databases or databases that are associated with proprietary applications. Certain proprietary information that is accessible by two or more companies may be stored in one database of a company, and in a different database of another company. As such, an employee of one company may not be aware of the existence of another company's database that contains the information he/she is seeking.
Furthermore, existing information retrieval systems may be limited in their ability to provide feedback that has high precision. Precision, a common way to measure retrieval effectiveness in information retrieval systems, is defined as the ratio of the number of relevant documents retrieved over the total number of documents retrieved. Precision is measured with a value ranging between zero and one. An ideal information retrieval system has a precision value equal to one. Retrieval effectiveness is typically based on document relevance judgments. These relevance judgments are problematic since they are subjective and unreliable.
Particularly, a problem in obtaining high precision is that a user often inputs query term(s) that matches with data indexed in a database, but the search attribute of the query term is different from the attribute of the data indexed in the database. For example, a user who wishes to obtain a list of articles written by the author, John, may input as query terms, “John, author.” However, the information retrieval system may provide, as feedback, articles written by someone else, and having the word, “John” in its contents. This is so because the information retrieval system does not recognize that the query term, “John,” represents a subject that has an author attribute, not a content attribute. This problem has long been recognized as a major difficulty in information retrieval systems.
For these reasons, there is a need for a system that allows knowledge of data to be shared and transferred within and among business entities. There is also a need for a system that allows retrieval of information spread over numerous otherwise disconnected and incompatible applications within a network. The system should also be able to provide effective information retrieval having improved precision.
Information retrieval system that allows a user to retrieve data having different attributes or attribute labels from different databases is described below. The information retrieval system associates a search attribute of a subject represented by a user input term, with one or more attributes of data from one or more databases. Based on this association, data whose attribute is associated with the search attribute is collected and provided to the user as feedback.
Other embodiments of the information retrieval system and methods of using the same are also described. Other and further aspects and features of the invention will be evident from reading the following detailed description of the preferred embodiments, which are intended to illustrate, not limit, the invention.