The present invention relates to electronic resource annotation. It has particularly utility when applied in electronic information retrieval, whether that information be in the form of documents or photos, or a description of a software component in a distributed system.
The dominant electronic information retrieval system in the world today is the World Wide Web. The largely unstructured nature of the Web means that the primary method of identifying a web-page containing the information which a user requires is to use a search engine. Search engines normally generate full-text indices which can be used to quickly identify web pages which contain all the words included in the user's search query. Page-ranking algorithms are then used to present the most relevant of those web-pages to the user.
Whilst this represents an effective method of retrieving electronic information relevant to a query, the only stage at which human intelligence is exploited is in the page-ranking algorithm (which captures human's recognition of the worth of a site by counting the number of web-pages which link to the site in question). The creation of the full-text index is purely automatic.
It is hoped that ‘tagging’ systems will improve search results by allowing a user to decide which labels or keywords should be attributed to a resource.
When a user finds a web-page which contains useful information he can save the address (URL) of the webpage on the computer which he is using to browse the Web. This is the familiar ‘bookmarking’ process. The ‘bookmarking’ interface enables a user to store bookmarks in a hierarchical folder system. Hence, the user is able to navigate to a useful page by drilling down to a relevant folder in the hierarchical folder system.
So-called social bookmarking is a development of this idea in which a user can upload the bookmarks stored on their own computer to a server computer. That server computer then offers the bookmark information to others.
Some such sites offer users the ability to add annotations (tags) to the shared bookmarks. These annotations might be user ratings for the web-page or keywords which the user has assigned to the web-page (the latter often being referred to as ‘tags’). An example of such a site is the website del.icio.us. The web-site del.icio.us allows users to see a list of sites tagged with a given word by users. It is trivial to rank them by the number of users which have given a web-page the same tag. This gives some idea of user's perception of the quality of the webpage and also its relevance to that tag.
Unlike top-down centralized approaches, collaborative tagging systems (sometimes referred to as folksonomies) like del.icio.us provide users with the freedom to use tags of their choice and thus capture the way in a community of users describe and categorise resources. The community of users is thus provided with a set of resources which are tagged in a way which allows them to quickly retrieve relevant resources.
Where a community of users includes users who describe and categorise resources in different ways, the above benefits are diluted. To overcome this, some systems suggest tags to the user which better fit with the way other members of the community of users have chosen to tag the resource.
A straightforward way of doing this is to present the user with tags which have proved popular amongst the community of users. A common way of providing a user with a visualisation of this is using tag clouds, visual representations where each tag is displayed with a font size which is proportional to its popularity. Second generation tag clouds integrate the notion of relationships among tags or their meaning as seen in the paper entitled “Improving Tag-Clouds as Visual Information Retrieval Interfaces” presented by Y. Hassan-Montero and V Herrero-Solana at the International Conference on Multidisciplinary Information Sciences and Technologies, in October 2006.
In del.icio.us, when a user visits the page containing all the bookmarks tagged with a given tag, a list of related tags to that selected one is shown inside a sidebar. The related tags might be those which are found to frequently be applied together with the given tag.
A problem arises however in that some users use tags which are idiosyncratic to themselves or are unique to a group to which they belong, which group forms only a small fraction of the group of people tagging the resources in the system.
Z. Xu, Y. Fu, J. Mao, D. Su present a paper entitled “Towards the Semantic Web: Collaborative Tag Suggestions”, in Proceedings of the Collaborative Web Tagging Workshop at the WWW 2006, Edinburgh, Scotland, 2006. In that paper they point out the desirability of a set of tags applied to an object to include tags of various types. The paper refers to these types as ‘facets’ and list ‘content-based tags’, ‘context-based tags’, ‘attribute tags’ and ‘subjective tags’ as examples of ‘facets’.
According to a first aspect of the present invention, there is provided a method of electronic resource annotation comprising:                receiving, a plurality of groups of tags;        selecting on the basis of one or more tags received from a user and said groups of tags, one or more groups of tags under-represented in the tags received from said user; and        proposing tags from said under-represented group to said user as said user applies tags to a resource.        
By arranging tags into groups of tags where it is desirable that the set of tags applied to a resource includes tags from each of the groups, monitoring tags input by the user, finding groups of tags which are under-represented in the tags so far entered by the user in relation to the resource, and proposing to the user tags from those under-represented groups, more coherent or descriptive sets of tags for resources are gathered from users. Where the resources are services in a distributed computer system then a more rapid identification of a suitable service or substitution of one service for another is enabled. Where the resources are documents or other items of electronic media, then a more rapid retrieval of an appropriate document or media article is enabled.
Preferably, each of said groups of tags comprise a group of tags often used by said user, and one or more of groups of tags often used by respective groups of users, said selection identifying one or more groups of users whose tagging behaviour differs from the user, said proposal proposing tags to said user favoured by said one or more groups of users with different tagging behaviour as said user applies tags to a resource.
By suggesting tags representative of tags applied by groups whose tagging behaviour diverges from a user's individual tagging behaviour a more coherent set of tags for describing resources in a system is provided. In addition, the balancing of tags typically used by different groups of users allows, for example, the user's personal/idiosyncratic tags to be included to some degree in the suggested tags but allows those to be counteracted by collectively popular tags which tend to me the tag descriptions applied by users in general to the resources more globally coherent.
Preferably, said method further comprises recognising user selection of said one or more proposed further tags, and repeating said selection and proposal steps.
In this way the list of suggested tags can be updated each time the user enters another tag to be applied to the resource.
According to a second aspect of the present invention, there is provided distributed system comprising one or more user terminals, an electronic resource store, a resource label store for storing, for each of said electronic resources, labels applied by users to said electronic resource, and communications links between said user terminal and said electronic resource data store and between said user terminal and said resource label store;
said distributed system further comprising a label group store which stores groups of labels of different types;
wherein each of said user terminals is arranged in operation to:
enable said user to select an electronic resource;
in response to said selection, to display said selected electronic resource on a display of the user terminal;
to receive via a user interface provided by the user terminal, textual labels which the user considers appropriate to said selected electronic resource; and
to send said textual labels together with an indication of said resource to said resource label store to enable said store to be updated;
said distributed system being arranged in operation to respond to a user selection of an electronic resource by identifying one or more label groups under-represented in labels input by said user, and to select labels from said one or more under-represented groups and to send said labels to said user terminal;                said user terminal being further arranged in operation to present said user with said sent labels as proposals for labels to be applied to the selected resource.        