The present invention relates to electronic resource annotation. It has particularly utility when applied in electronic information retrieval, whether that information be in the form of documents or photos, or a description of a software component in a distributed system.
The dominant electronic information retrieval system in the world today is the World Wide Web. The largely unstructured nature of the Web means that the primary method of identifying a web-page containing the information which a user requires is to use a search engine. Search engines normally generate full-text indices which can be used to quickly identify web pages which contain all the words included in the user's search query. Page-ranking algorithms are then used to present the most relevant of those web-pages to the user.
Whilst this represents an effective method of retrieving electronic information relevant to a query, the only stage at which human intelligence is exploited is in the page-ranking algorithm (which captures human's recognition of the worth of a site by counting the number of web-pages which link to the site in question). The creation of the full-text index is purely automatic.
It is hoped that ‘tagging’ systems will improve search results by allowing users to decide which labels or keywords should be attributed to a resource.
When a user finds a web-page which contains useful information he can save the address (URL) of the webpage on the computer which he is using to browse the Web. This is the familiar ‘bookmarking’ process. The ‘bookmarking’ interface enables a user to store bookmarks in a hierarchical folder system. Hence, the user is able to navigate to a useful page by drilling down to a relevant folder in the hierarchical folder system.
So-called social bookmarking is a development of this idea in which a user can upload the bookmarks stored on their own computer to a server computer. That server computer then offers the bookmark information to others.
Some such sites offer users the ability to add annotations (tags) to the shared bookmarks. These annotations might be user ratings for the web-page or keywords which the user has assigned to the web-page (the latter often being referred to as ‘tags’). An example of such a site is the website del.icio.us. The web-site del.icio.us allows users to see a list of sites tagged with a given word by users. It is trivial to rank them by the number of users which have given a web-page the same tag. This gives some idea of user's perception of the quality of the webpage and also its relevance to that tag.
Unlike top-down centralized approaches, collaborative tagging systems (sometimes referred to as folksonomies) like del.icio.us provide users with the freedom to use tags of their choice and thus capture the way in a community of users describe and categorise resources. The community of users is thus provided with a set of resources which are tagged in a way which allows them to quickly retrieve relevant resources.
Where a community of users includes users who describe and categorise resources in different ways, the above benefits are diluted. To overcome this, some systems suggest tags to the user which better fit with the way other members of the community of users have chosen to tag the resource.
A straightforward way of doing this is to present the user with tags which have proved popular amongst the community of users. A common way of providing a user with a visualisation of this is using tag clouds, visual representations where each tag is displayed with a font size which is proportional to its popularity. Second generation tag clouds Integrate the notion of relationships among tags or their meaning as seen in the paper entitled “Improving Tag-Clouds as Visual Information Retrieval Interfaces” presented by Y. Hassan-Montero and V Herrero-Solana at the International Conference on Multidisciplinary Information Sciences and Technologies, in October 2006.
In del.icio.us, when a user visits the page containing all the bookmarks tagged with a given tag, a list of related tags to that selected one is shown inside a sidebar. The related tags might be those which are found to frequently be applied together with the given tag.
Another method of choosing tags to suggest to the user is to use thesauri like WordNet, Google Search and other engineered existing ontologies such as Dublin Core or Library of Congress Authorities. An example is seen in S. Hayman and N. Lothian, “Taxonomy-directed Folksonomies”, World Library and Information Congress: 73rd IFLA General Conference and Council, Durban, July 2007 where, as a user types a tag, a list of auto-complete suggestions are given, and when a user moves the cursor over one of those suggestions, narrower terms and broader terms are also offered. Such narrower terms and broader terms might well be based on an existing taxonomy.
Yet another method of choosing tags to suggest to the user is to calculate clusters of tags on the basis of the degree to which those tags tend to be used together by users (referred to in the art as the co-occurrence of tags), and then propose to the user tags which are in the same cluster as the tag the user has entered. An example of this is seen in the paper “Automated Tag Clustering: Improving Search and Exploration in the Tag Space”, by Begelmann et al, found in the proceedings of the 15th International World Wide Web Conference WWW2006.
The paper “Integrating Folksonomies with the Semantic Web”, by Lucia Specia et al (at pages 624-632 of the proceedings of the 4th European Semantic Web Conference 2007) goes further, and associates clusters of tags with concepts in ontologies, and thereby finds relationships between clusters of tags. The resulting structure can be used in query extension/disambiguation, visualization, and tag suggestion.
The paper “A Collaborative-Tagging System for Learning Resources Sharing”, by Wen-Tai Hsieh et al, at pages 1364-1368 of Current Developments in Technology-Assisted Education, vol. 2, also proposes that a concept hierarchy of tags should be constructed and used in refining searches and/or suggesting tags.
Each of the above three papers group tags to form a taxonomy of tags.
The paper “An Approach to Support Web Service Classification and Annotation”, by Marcello Bruno et al, proposes automatically classifying web services to specific domains based on annotations applied to those web services by their authors and building a lattice of relationships between service annotations. The automatic classification is for use in discovering web-services for use in building software applications from web-service components.
The paper “Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering”, proposes the automatic generation of a ‘hierarchy of tags’ which, despite its name, has a global cluster that contain all of the articles used in generating the hierarchy, which global cluster is then sub-divided into sub-clusters containing subsets of all the articles. The automatically generated ‘hierarchy of tags’ can then be used to suggest synonymous tags, as well as more specific and more general tags.
The present inventors have realised that the classification of resources (documents, services etc.) into a taxonomy or other classification scheme can be further improved.
According to a first aspect of the present invention, there is provided a method of electronic resource annotation comprising:                receiving, for each of a plurality of categories of resource, a category tag list of tags often applied to resources in that category;        receiving, from a user who has reviewed the resource, one or more tags the user attributes to that resource;        calculating on the basis of said one or more tags received from said user and said category tag lists, a degree of membership of the resource to each of said plurality of categories;        selecting two or more candidate categories to which the resource has the highest degree of membership; and        proposing further tags in the category tag lists of said selected candidate categories to said user.        
By allowing a user to apply one or more tags to a resource and then using category tag lists built from tags applied by users to resources to calculate a degree of membership of the resource to each of a plurality of candidate categories to which the resource might belong, selecting the two or more candidate categories to which the resource has the highest degree of membership, and then proposing tags from the category tag lists of the selected two or more candidate categories, a method of encouraging a user to use tags which reduce ambiguity as to which category a resource belongs is provided. This speeds up information retrieval and makes it more accurate.
Preferably, said method further comprises recognising user selection of said one or more proposed further tags, and repeating said selection and proposal steps.
In this way the list of suggested tags can be updated each time the user enters another tag to be applied to the resource.
Preferably, some of said proposed tags are appropriate to one candidate category and one or more other proposed tags are appropriate to another category.
This allows a user to resolve an ambiguity as to which of two categories a resource belongs by choosing a tag appropriate to one of the two or more candidate categories. For example, if a user enters the tag ‘jaguar’, and the system finds that ‘jaguar’ might belong to category animal/cat or product/car, then the system might propose ‘car’ and ‘cat’ as suggested tags. It will be soon how, by selecting one or the other of those tags, then the user enters tags which better characterise the resource. Furthermore, when combined with the updating of the suggested tags in response to each of the users tag selections, it will be seen how the user will be encouraged to enter tags which resolve successive ambiguities and thus provide a useful set of tags for characterising the resource.
Preferably, proposing further tags involves emphasizing tags associated with candidate categories judged to be more probable candidate categories given one or more tags provided by the user.
This differs from the way in which tag clouds emphasize more common tags used by users to tag resources in general. Instead, in this case, tags which help determine which candidate category the resource should be placed into, are emphasized, i.e. while selecting the appropriate tags for a resource the user can see which category it will most probably be assigned to. If the user wants the resource to be placed into another category he/she can choose to select different tags which are more associated with the category in question.
This has the advantage of giving the user feedback which seeks confirmation that the candidate category considered most likely by the system is in fact an appropriate category for the resource.
In preferred embodiments, said emphasis is achieved by presenting the user with more tags associated with said selected candidate categories than tags associated with other (unselected) categories. Emphasis could also be created by displaying tags associated with the more likely categories in a bigger font, or in a different colour or, more generally, controlling the displaying to provide some form of visual emphasis to tags associated with the selected candidate categories.
According to a second aspect of the invention, there is provided a distributed system comprising one or more user terminals, an electronic resource store, a resource label store for storing, for each of said electronic resources, labels applied by users to said electronic resource, and communications links between said user terminal and said electronic resource data store and between said user terminal and said resource labelling store;
said distributed system further comprising a resource categorisation store which stores, for each resource, an indication of a category to which said resource is deemed to belong;
wherein each of said user terminals is arranged in operation to:
enable said user to select an electronic resource;
in response to said selection, to display said selected electronic resource on a display of the user terminal;
to receive via a user interface provided by the user terminal, textual labels which the user considers appropriate to said selected electronic resource; and to send said textual labels together with an indication of said resource to said resource label store to enable said store to be updated;
said distributed system being arranged in operation to respond to a user selection of an electronic resource by identifying one or more candidate categories for said resource using the information stored in said resource categorisation store, and to select labels appropriate to said one or more candidate categories and to send said labels to said user terminal;
said user terminal being further arranged in operation to present said user with said labels as proposals for labels to be applied to the selected resource.