Information retrieval systems are designed to store and retrieve information provided by publishers covering different subjects. Both static information, such as works of literature and reference books, and dynamic information, such as newspapers and periodicals, are stored in these systems. Information retrieval engines are provided within prior art information retrieval systems in order to receive search queries from users and perform searches through the stored information. It is an object of most information retrieval systems to provide the user with all stored information relevant to the query. However, many existing searching/retrieval systems are not adapted to identify the best or most relevant information yielded by the query search. Such systems typically return query results to the user in such a way that the user must retrieve and view every document returned by the query in order to determine which document(s) is/are most relevant. It is therefore desirable to have a document searching system which not only returns a list of relevant information to the user based on a query search, but also returns the list to the user in such a form that the user can readily identify which information returned from the search is most relevant to the query topic.
Existing systems for searching and retrieving files from databases based on user queries are directed primarily to the searching and retrieval of textual documents. However, there is a growing volume of multi-media information being published which is not textual. Such multi-media information corresponds, for example, to still images, motion video sequences and digital audio sequences, which may be stored and retrieved by digital computers. It would be desirable from the point of view of an individual using an information searching/retrieval system to be able to query a library or database and identify not only text documents, but also multi-media files that are relevant to user's query. Moreover, it would be desirable if the searching system could return to the user not only a single list having both text and multi-media information relevant to the query search, but also a list which enabled the user to readily identify which of the text and multi-media files were most relevant to the query topic.
Each different publisher providing documents that may be retrieved by information retrieval systems typically uses its own information format to store and transmit its information files. Thus, an information searching/retrieval system which has a library database based upon information from many various publishers must be compatible with many different publisher formats. This compatibility requirement can serve to slow the performance of an information searching/retrieval system.
It is well known in the prior art of information retrieval systems to permit a user to specify a single subject of a number of subjects for searching. For example, a user may wish to search only sports literature, medical literature or art literature. This avoids unnecessary searching through database documents that are not relevant to the subject of interest to the user. In order to provide this capability, information retrieval systems must categorize documents received from publishers according to their subject prior to adding them to the database. Subjecting of incoming documents often requires an individual to read each incoming and make a determination regarding its subject. This process is very time consuming and expensive, as there is often a large number of incoming documents to be processed. The subjecting process may be further complicated if certain documents should properly be categorized in more than one subject. It would be desirable to have an automated system for processing incoming documents which categorized each incoming document into one or more subjects, and which did not require an individual to read each incoming document and make a separate judgment categorizing the subject of such document.
When a user of an information searching/retrieval system enters a search query into the system, the query must be parsed. Based on the parsed query, a listing of stored documents relevant to the query is provided to the user for review. In the prior art, it is known to use semantic networks when parsing a query. Semantic networks make it possible to identify words not appearing in the query, but which correspond to or are associated with the words used in the query. The number of words used to search the database is then expanded by including the corresponding words or associated words identified by the semantic network in the search instructions. This procedure is used to increase the number of relevant documents located by the information searching/retrieval system. Although semantic networks may be useful for finding additional relevant documents responsive to a query, it is believed that use of such networks also tends to increase the number of irrelevant documents located by the search. In fact, it is generally believed that the number of additional relevant documents identified through the use of semantic networks is roughly equal to the number of irrelevant documents which are also brought into the search results list as a result of the semantic network. It would be desirable to have a system for implementing a semantic network which maximized the number of relevant documents identified during the search, without substantially increasing the number of irrelevant documents found by the search.
Many publishers that provide documents to information retrieval systems require record-keeping in order to ensure accurate royalty payments. Record-keeping permits the publishers to determine the interest level in various documents produced by the publisher, and the demographics of users retrieving such documents. Thus, it would be desirable to have a searching/retrieval system that tracked not only how often each document stored in the system database was retrieved by users, but also the demographics of the users retrieving the documents and the query searches used to identify and retrieve such documents.
It is therefore an object of the present invention to provide a searching/retrieval system which can query a library or database and identify not only text documents, but also multi-media files stored on the library or database that are relevant to query.
It is a further object of the present invention to provide a searching/retrieval system that accepts a query and returns a single search results list having both text and multi-media information, which list is presented in a format that enables the user to readily identify which of the text and multi-media files are most relevant to the query topic.
It is a still further object of the present invention to provide a scalable computer architecture for implementing a searching/retrieval system which can query a database and identify text documents and multi-media files stored on the database that are relevant to query.
It is a still further object of the present invention to provide an information searching/retrieval system which has a library database based upon information from many various publishers, and which is compatible with many different publisher formats.
It is a still further object of the present invention to provide an information searching/retrieval system which has a library database based upon information from many various publishers, and wherein such information is stored in a central database in one or more common information formats.
It is a still further object of the present invention to provide an automated system for processing incoming documents to be stored on a library or database, which system categorizes each incoming document into one or more subjects, and which does not require an individual to read each incoming document and make a separate judgment categorizing the subject of such document.
It is a still further object of the present invention to provide a system for implementing a semantic network which maximizes the number of relevant documents identified during the query search, without substantially increasing the number of irrelevant documents found by the search.
It is a still further object of the present invention to provide a system for using a semantic network which maximizes the number of relevant documents identified during a query search by semantically expanding the search in response to the part of speech associated with each query term in the search.
It is a still further object of the present invention to provide a searching system that queries a database to determine text documents and multi-media files relevant to the query, wherein weightings associated with proper nouns and slow words are adjusted prior to searching the database.
It is a further object of the present invention to provide a searching/retrieval system that accepts a query and returns a single search results list including document relevance values, wherein the document relevance values are independent of the number of terms in the query.
It is yet a still further object of the present invention to provide a searching/retrieval system that tracks not only how often each document stored in the system database was retrieved by users, but also the demographics of the users retrieving the documents and the query searches used to identify and retrieve such documents.
These and other objects and advantages of the invention will become more fully apparent from the description and claims which follow or may be learned by the practice of the invention.