The present invention relates to a distributed system. It has particular utility in distributed systems built in accordance with a service-oriented architectures and also in distributed information storage and retrieval systems
The dominant electronic information retrieval system in the world today is the World Wide Web. The largely unstructured nature of the Web means that the primary method of identifying a web-page containing the information which a user requires is to use a search engine. Search engines normally generate full-text indices which can be used to quickly identify web-pages which contain all the words included in the user's search query. Page-ranking algorithms are then used to present the most relevant of those web-pages to the user. Some search engines, for example clusty.com, cluster the results.
When a user finds a web-page which contains useful information he can save the address (URL) of the web-page on the computer which he is using to browse the Web. This is the familiar ‘bookmarking’ process. The ‘bookmarking’ interface enables a user to store bookmarks in a hierarchical folder system. Hence, the user is able to navigate to a useful page by drilling down to a relevant folder in the hierarchical folder system.
U.S. Pat. No. 7,167,901 discloses a ‘bookmarking’ interface in which the user's web browser automatically generates records for each bookmarked web-page which include keywords describing the content of that web-page. In addition, the user is provided with an interface which allows him to view the keywords associated with a bookmarked web-page and to add further keywords for association with that web-page.
The above US patent contemplates that one user might send another user his bookmark file. So-called social bookmarking is a development of this idea in which many users upload the bookmarks stored on their own computers to a server computer. That server computer then offers the bookmark information to those users and often to other users too.
Some such sites offer users the ability to interact with the server computer to add annotations to the shared bookmarks. These annotations might be user ratings for the web-page or keywords which the user has assigned to the web-page (the latter often being referred to as ‘tags’). An example of such a site is the web-site del.icio.us. The web-site del.icio.us allows users to see a list of sites tagged with a given word by users. It is trivial to rank them by the number of users which have given a web-page the same tag. This gives some idea of user's perception of the quality of the web-page and also its relevance to that tag.
The above-mentioned U.S. Pat. No. 7,167,901 envisages a stand-alone system where the browser program can provide the user with a list of bookmarked web-pages associated (either automatically or by the user) with a user-specified keyword.
Del.icio.us users can organise tags into user-defined clusters. Flickr (www.flickr.com) sometimes presents its search results in the form of clusters. Users can then identify which cluster is likely contain results they are interested in and refine their search to present only results in that cluster.
Whilst tagging of most types of information (e.g. web pages, photographs, videos) is well known, there is little literature about tagging of software components (e.g. Web Services) for use in building distributed applications.
A number of companies specialise in software which introduces structure into a mass of unstructured documents by categorizing those documents on the basis of keywords extracted from those documents. The companies in this field include Autonomy Inc (www.autonomy.com), GammaSite Inc (www.gammasite.com), and Inxight Software Inc (www.inxight.com).
A customer of these companies can use the software to categorize unstructured documents, and thus expedite the retrieval of information (since the search can be limited to the category in which the customer is interested).
The present inventors have seen how tagging and automatic structure generation can be usefully combined and applied to distributed applications in order to improve the performance of systems running distributed applications.
According to a first aspect of the present invention, there is provided a distributed system comprising:    one or more computers arranged in operation to:    i) receive digital resource identifiers and words attributed to the digital resources, and to automatically generate a classification of said digital resources on the basis of said attributed words;    ii) present a human user with a graphical user interface enabling modification of said automatically generated classification;    iii) modify said classification in accordance with user commands received via said graphical user interface; and    iv) utilise said modified classification in identifying one or more digital resources.
By automatically organising digital resources into groups based on keywords attributed to those digital resources, thereafter presenting a user with an interface enabling the modification of those groups, and subsequently utilising said modified grouping in finding an entity for a user, the speed of retrieval of a digital resource is improved.
Where the digital resources are documents, this results in the speed of retrieval of information relevant to the user's query being improved. Where the digital resources are distributed software components, this enables the rapid and effective location of a suitable component, and the rapid substitution of another component in the event that a first-selected component is unavailable.