The present invention relates to an information system and to a method of retrieving information. In particular, the invention relates to an information system for retrieving and rating targeted electronic information.
The advent of networks, and particularly the Internet with its World Wide Web (xe2x80x9cWebxe2x80x9d) facility, has caused a huge increase in the amount of electronic information available to individual users and to organizations. This information is typically made available as documents on Web sites, electronic news feeds, subscription data feeds, and such like. Much of this electronic information is document based.
One problem associated with having a vast amount of available electronic information is how to locate relevant items in the mass of information.
Internet search engines are available which allow users to locate only those Web pages containing certain key words, or relating to certain topics or subjects. However, one problem with search engines is that the user must repeat the search regularly to locate new information. Another problem is that even if a user performs a search regularly, it is difficult to determine short or long term trends from such a search. If the searches are not performed frequently enough, then time-critical, information items may be missed. Yet another problem is that Internet search engines may not provide an adequate indication of how the volume of information items has changed since the last search was performed.
A large organization typically has a substantial number of people who are interested in a specific subject of importance to that organization. The specific subject may be, for example, a particular technology, a market segment, new legislation, or such like. To remain up to date with developments in the specific subject, the organization typically has one or more subject matter experts (SMEs). The SMEs are people who monitor developments in the specific subject and provide other members of the organization with synopsis information relating to the specific subject.
One problem with relying on SMEs is that the information they have is typically retained by the individuals rather than in electronic systems. This means that it is difficult to make the information available across a large organization that may span several countries.
It is among the objects of an embodiment of the present invention to obviate or mitigate one or more of the above disadvantages or other disadvantages associated with information retrieval, classification, and retention.
According to a first aspect of the present invention there is provided an information system comprising: means for selecting subject matter of interest to a user; means for retrieving information items; means for classifying information items to identify information items relating to the selected subject matter; means for rating the identified information items; and means for notifying the user about identified information items meeting a predetermined criteria.
Preferably, the means for selecting subject matter of interest to a user includes means for allowing a user to select an interest value, so that only those information items rated above that interest value will be notified to the user.
Preferably, the means for selecting subject matter of interest to a user is implemented by an application presenting a user with an interface through which the user may select subject matter of interest.
Preferably, the means for retrieving information items retrieves items prior to the means for classifying items classifying the retrieved items. Thus, all new information items are retrieved, regardless of whether they relate to a selected subject matter or not; those new information items relating to a selected subject matter are then identified, and those new information items not relating to a selected subject matter are discarded.
Alternatively, the means for retrieving information items only retrieves those items that have been identified by the classifying means as relating to the selected subject matter. This is less preferable because it is more difficult to classify information items at a third party""s Web site, as this may require some form of mobile intelligent agent infrastructure, both on the third party""s Web page and in the information system.
Preferably, the means for retrieving information items is operable to retrieve information via a network, such as a TCP/IP network. Conveniently, the retrieving means is operable to retrieve information using conventional protocols, such as HTTP (hypertext transfer protocol), FTP (file transfer protocol), and such like. In a preferred embodiment, a retrieval intelligent agent is used to make HTTP requests to certain pre-defined Web sites to retrieve newly-updated information from those Web sites.
Preferably, the means for retrieving information items is activated at regular intervals so that data sources are checked for relevant information on a regular basis. The information retrieving means may be activated during a night period, or some other period of low network traffic.
Preferably, the means for retrieving information items includes an extraction routine for extracting text from the information items (that is, for removing any images, control characters, tags, document format data, or such like that may be contained in the information items).
Preferably, the means for classifying information items includes a filtering routine for filtering out any information items that do not relate to the selected subject matter. Conveniently, the filtering routine operates by keyword searching on the extracted text, and by weighting the keywords using a concept hierarchy.
Preferably, the means for rating the identified information items is implemented automatically by an intelligent agent. Conveniently, the rating intelligent agent includes a rating component for performing the rating function. The rating component may comprise: a rules based system, such as an Expert system; or an artificial neural network; or a fuzzy system; or such like.
Information items may be documents or parts of documents, for example text extracted from a document.
The interface may provide a user with a hierarchical list of subject matter. For example, the highest level may comprise a list including: xe2x80x98technologyxe2x80x99 information, xe2x80x98legalxe2x80x99 information, xe2x80x98economicxe2x80x99 information, xe2x80x98financialxe2x80x99 information, and such like. If a user selects, for example, xe2x80x98technologyxe2x80x99 information, the next level may comprise a list of different technology areas, such as: xe2x80x98displaysxe2x80x99, xe2x80x98connectorsxe2x80x99, xe2x80x98processorsxe2x80x99, and such like. Each of these technology areas would include a list of technology types within that area, for example, the next level after the xe2x80x98displaysxe2x80x99 area may include: xe2x80x98liquid crystal displaysxe2x80x99, xe2x80x98plasma displaysxe2x80x99, xe2x80x98cathode ray tubesxe2x80x99, and such like.
Preferably, the interface allows a user to add new subject matter categories, for example, by adding new concepts and keywords relating to the new concepts. This allows the system to be adaptable so that it can gather information relating to emerging concepts.
Conveniently, the interface may be implemented by a Web browser.
The predetermined criteria includes the information item relating to a subject matter selected by the user, and preferably also includes the information item having a rating above the interest value for that subject matter set by the user.
Preferably, the information system is implemented using an intelligent agent infrastructure. Suitable conventional intelligent agent infrastructures are available, such as the Infosleuth (trade mark) infrastructure, as described in more detail at xe2x80x9chttp://www.mcc.com/projects/infosleuth/xe2x80x9d. Other agent systems, such as the Aglets (trade mark) infrastructure, or the Concordia (trade mark) infrastructure may be used. An Aglets Software Development Kit is available from IBM (trade mark). A Concordia infrastructure is available from Mitsibushi Electric Company at the Web URL http://www.meitca.com/HSL/Projects/Concordia/.
Software intelligent agents are well known and are explained in, for example, xe2x80x9cDeveloping Intelligent Agents for Distributed Systems: Exploring Architecture, Technologies, and Applicationsxe2x80x9d by Michael Knapik and Jay B. Johnson, McGraw-Hill; ISBN: 0070350116.
The advantage of using an intelligent agent infrastructure is that each component in the system can be programmed to perform a specific task; this allows the system to be scaled very easily, without having to re-write large amounts of software.
Alternatively, the system may be implemented as a single software program.
Preferably, the means for notifying the user of identified information items meeting a predetermined criteria is implemented using a notifying intelligent agent using an electronic delivery channel, such as electronic mail.
In preferred embodiments, a notice of updated information may be sent, and the new information items may be stored on, for example, a Web server that can be accessed by the user. In other embodiments, the new information items may be sent to the user.
The notifying means may include a comparing routine for determining whether a retrieved information item exceeds a predetermined threshold (an interest value). The comparing routine may comprise: a rules based system, such as an Expert system; or an artificial neural network; or a fuzzy system; or such like.
The system may further include feedback request means for requesting a subject matter expert to apply a rating to the retrieved information items.
The rating means may further comprise a feedback routine whereby the rating means automatically applies an initial rating using artificial intelligence; receives a rating applied by a subject matter expert; and modifies the rating and the rating process to approximate closer to the rating of the expert.
The feedback routine may be configured to learn about non-text based evaluation factors. For example, the feedback routine may learn that every article written by a certain author is always rated by a subject matter expert higher than the rating of the rating component.
The system may further comprise means for allowing an SME to subscribe to a concept (as an SME for that concept), and to enter a threshold value for that concept, so that the system will only request feedback from the SME for any item relating to that concept and having a rating exceeding the threshold value set by the SME.
One advantage of using a feedback routine is that an SME is able to participate in the evaluation of information items, so that the SME""s knowledge is used to rate items. One advantage of allowing an SME to enter a threshold is that the SME only receives feedback requests for the most relevant items.
By virtue of this aspect of the invention, an information system is provided that automatically searches for new information items relating to selected subject matter, applies a value of importance to any relevant information items found, and notifies a user if an information item is obtained having an importance rating that exceeds a predetermined threshold value. This provides the users of the system with an array of information items relating to topics that are important to the users and that are rated according to their importance. Thus, the system behaves like an automated subject matter expert.
According to a second aspect of the invention there is provided a method of collecting selected information, the method comprising the steps of: identifying subject matter of interest to a user; retrieving information items relating to the identified subject matter; rating the retrieved information items; and notifying the user of retrieved information items meeting a predetermined criteria.
The step of retrieving information items relating to the identified subject matter may include the sub-steps of: identifying a plurality of sources of information, accessing each of the plurality of sources of information; for each source of information, extracting any information items relating to the identified subject matter.
The sub-step of extracting any information items relating to the identified subject matter preferably includes the sub-step of only extracting any information item which is more recent than the information item retrieved on a previous visit to the information source.
The sub-step of extracting any information items relating to the identified subject matter may be replaced by the sub-steps of extracting any new information items, and filtering out from the new information items those items relating to the identified subject matter.
The step of rating the retrieved information items may include the sub-steps of: automatically applying an initial rating using artificial intelligence; and transmitting the retrieved information to a subject matter expert (SME) for the expert to apply a rating. The expert may also annotate the information item, or provide additional information about the information item to assist any users who read the information item.
The method may include the further sub-step of using the rating applied to the information item by the SME to modify the automatic rating system. This is particularly advantageous where the automatic rating system is implemented by an Expert system, an artificial neural network, a fuzzy system, or some other adaptive intelligent system.
The step of notifying the user of retrieved information items meeting a predetermined criteria is preferably implemented using an electronic communication channel, such as electronic mail, FTP, or such like.
According to a third aspect of the present invention there is provided a method of disseminating targeted information to a plurality of users within an organization, the method comprising the steps of: receiving from each of a plurality of users, a selected subject matter of interest to that user; storing for each user the subject matter selected by that user; accessing a plurality of information sources; retrieving information items relating to any of the stored selected subject matter; for each retrieved information item, applying an importance value to that item; and notifying each user of any retrieved information items meeting a predetermined criteria.
According to a fourth aspect of the present invention there is provided a business intelligence system comprising: registering means for allowing a user to select subject matter of interest to that user; searching means for accessing different information sources and for extracting information items from these sources; evaluating means for applying an importance value to each extracted information item relating to the subject matter of interest; and notifying means for notifying the user about any information items meeting a predetermined criteria.
The registering means may be implemented by a Web browser having an interface allowing a user to enter key words or other identifiers relating to a selected subject matter.
The evaluating means may include a facility for transmitting an information item to a subject matter expert and for receiving from the subject matter expert a rating for that information item. The evaluating means may use this received rating to adapt its own rating system.
The system may be based on intelligent agents that:
1. access Web sites to identify relevant information items,
2. store the relevant information items (or data extracted from the relevant information items) in a server,
3. apply an importance value to each information item stored, and
4. allow users to access the server, for example, from a Web browser.
In one embodiment, a Java (trade mark) applet may be included in a Web browser to allow a user to subscribe to a subject matter and to view the results of the searches for information items relating to that subject matter.
According to a fifth aspect of the present invention there is provided a client-server information system, the system comprising: a client having an interface for selecting subject matter of interest to a user; and a server for retrieving information items relating to the selected subject matter; rating the retrieved information items; and notifying the user of retrieved information items meeting a predetermined criteria.
According to a sixth aspect of the invention there is provided a method of configuring an information system, the method comprising the steps of: defining a subject matter of interest; identifying sources of information; identifying infrastructures to be used; and configuring the identified infrastructures to employ resources to access the sources of information to retrieve information items relating to the subject matter of interest.
The step of defining a subject matter of interest may include the step of selecting which Web sites are to be visited to search for information items, and defining the parts of that Web site in which information items are to be searched for, and the page structure of those parts of the Web site.
The step of identifying infrastructures to be used may include the step of selecting what functions are to be performed. For example, searching for information items, rating information items, notifying users, and such like.
The step of configuring the identified infrastructures to employ resources may include how components in a system communicate with each other, what a component should do if an error occurs, and such like.