1. Field of the Invention
This invention relates to a method and apparatus for identifying items of information from an information system which is especially useful for identifying information from the world wide web.
2. Description of the Prior Art
The world wide web and many other information systems contain vast amounts of information. It is hard for users to keep up with new information that may be relevant to them because of the difficulty of doing searches. The user needs to spend valuable time planning and constructing search strategies and ensuring these are kept up to date as new terminology comes into use. Even when this task has been completed, the search needs to be run and then the results managed. Many irrelevant items may be found in the search and the user still needs to spend valuable time evaluating these and pinpointing relevant items. Because of these problems many of the advantages of the information system are lost. The user finds the system too difficult to use or manage and does not gain the maximum benefit from the information system.
Known systems that have been developed to find relevant information on behalf of users include those based on profiles. These systems build up a profile of the user which is a description of the types of information that s/he is interested in and then use this profile as a gauge against which to assess the relevance of a piece of information to the user. However, these systems are problematic because in order to be effective the profiles have to accurately reflect the interests of the user. Consequently the profiles are difficult to create and maintain as the user""s interests change and adapt. Also, these systems are typically used in conjunction with a conventional search process and the profiles are used to prioritise or filter the search results before presentation to the user. This means that the whole process is still limited by the performance of the conventional search process in finding the required information quickly and without finding spurious information. Conventional search processes are designed to search, for example, a database or an index of terms. However, for unstructured, large systems like the world wide web conventional search processes are difficult to implement.
International patent application number WO 96/29661 in the name of Interval Research Corporation describes a system for the retrieval of hyperlinked information resources using heuristics. In this system a first xe2x80x9cexplorationxe2x80x9d heuristic is used to search an information system (such as a network of linked textual or multi-media information) and this process finds at least one information resource for presentation to the user. Then a second xe2x80x9cpresentationxe2x80x9d heuristic is used to present selected resources to the user and the user provides feedback indicative of the degree of relevance of the presented information in the form of a rating, score, or binary parameter, such as yes/no. The first and second heuristics are then modified on the basis of the ranking functions. This process is disadvantageous because it requires the user to provide relevance feedback which is time consuming. Also, the system searches the information system itself which makes it less useful for large unstructured databases.
International application number WO 95/29451 in the name of Apple Computer, Inc. describes a system for ranking the relevance of information objects accessed by computer users. This system involves storing a profile of interests for each user having access to the system and items of information are displayed in order of ranking. This means that the system has the disadvantages of xe2x80x9cprofilexe2x80x9d systems as described above. Also, this system is not intended to be used to search large databases but rather is designed for use with electronic mail messages and bulletin board systems.
U.S. Pat. No. 5,537,586 describes a method for extracting a preferred set of textual records from a database. However, this is fundamentally a criteria-based search system. It will not find data that is not indexed and it relies on maintaining a search profile for each user.
International Patent Application Number W0 97/26729 describes a system for identifying which advertisements to present to a particular user of the world wide web. Using a xe2x80x9cSmart Ad Boxxe2x80x9d simultaneous viewers of the same web page can be presented with different advertisements and this document describes a way in which advertisers can decide which of a number of adverts to present. In order to do this a measure of similarity between several individual users of the World Wide Web is generated. For one user the individuals with the greatest calculated similarity to that user become that user""s community. The system then determines which advertisements to show to the user based on characteristics of that user""s community.
This system is designed for identifying advertisements from a limited number of possible items, rather than for identifying any items of potential interest from the whole World Wide Web. Also, under this system, users are presented with advertisements that they have previously seen. The system is described as being used in conjunction with demographic data such as the age and domicile of a user. This is similar to profile based systems and is therefore subject to the same disadvantages. The demographic data is complex to obtain, maintain and use.
It is accordingly an object of the present invention to provide an apparatus and method for identifying items of information from an information system which overcomes or at least mitigates one or more of the problems noted above.
According to a first aspect of the present invention there is provided a method of identifying items of information from an information system, for at least one of a group of users of the information system, said method comprising the steps of:
(i) obtaining a first record of items of information requested from the information system by each user in the group;
(ii) obtaining a second record of items of information requested from the information system on more than one occasion by the same user;
(iii) determining a score for each pair of users in the group said score being determined on the basis of a number of items from the second record requested by one user in the pair, that were also requested by the other user in the pair;
(iv) for each user allocating one or more group members as friends for that user on the basis of the scores for pairs containing that user; and
(v) for each user identifying items of information that have been requested by a friend of the user that have not been requested by the user.
The invention also encompasses a corresponding apparatus for identifying items of information from an information system, for at least one of a group of users of the information system, said apparatus comprising:
(i) an obtainer arranged to obtain a first record of items of information requested from the information system by each user in the group and also to obtain a second record of items of information requested from the information system on more than one occasion by the same user;
(ii) a determiner arranged to determine a score for each pair of users in the group said score being determined on the basis of a number of items from the second record requested by one user in the pair, that were also requested by the other user in the pair;
(iii) an allocator arranged to allocate, for each user, one or more group members as friends for that user on the basis of the scores for pairs containing that user; and
(iv) an identifier arranged to identify, for each user, items of information that have been requested by a friend of the user that have not been requested by the user.
A corresponding information system is also provided comprising an apparatus for identifying items of information from the information system, for at least one of a group of users of the information system, said apparatus comprising:
(i) an obtainer arranged to obtain a first record of items of information requested from the information system by each user in the group and also to obtain a second record of items of information requested from the information system on more than one occasion by the same user;
(ii) a determiner arranged to determine a score for each pair of users in the group said score being determined on the basis of a number of items from the second record requested by one user in the pair, that were also requested by the other user in the pair;
(iii) an allocator arranged to allocate, for each user, one or more group members as friends for that user on the basis of the scores for pairs containing that user; and
(iv) an identifier arranged to identify, for each user, items of information that have been requested by a friend of the user that have not been requested by the user.
This method, apparatus and system each provide the advantage that xe2x80x9cinterestingxe2x80x9d information that a user has not previously requested is identified. The user does not have to spend time searching for information and is quickly and simply provided with new information that is potentially interesting. This helps the user to cope with and make use of the ever-increasing amount and variety of available information in the information system (for example, the world wide web). Information is identified for both novice and expert users of the information system and information-finding tasks of various kinds are aided, for example, answering a particular query, or casually browsing or surfing for information of interest in the absence of a specific query. Advantageously, no user profiles are created or maintained which greatly simplifies the process and helps to reduce errors that may arise and no demographic data needs to be obtained, maintained or used. Also, no actual search process is carried out and no access to the information system itself is needed. Information is identified by reference to information that other members of a group of users have requested. This means that no complex and lengthy search processes need to be carried out and the method is suitable for large unstructured information systems such as the world wide web as well as well organised data bases.
Another advantage is that, by using the second record, information identified for each user is removed from the identification process unless it is requested more than once by the same user. This helps to avoid suggesting things to users simply because these have previously been suggested to that user""s friend(s). This is especially important when the method has been repeated several times. Also, use of the second record in conjunction with the first record was found to produce unexpectedly good results. Users were presented with items identified by the system that were particularly useful and the number of irrelevant or uninteresting items was reduced.