1. Field of the Invention
The present invention relates to network-based communication and information discovery and pertains in particular to network-based storage and management of network-accessible resources.
2. Description of the Related Art
In recent years web-based systems such as enterprise information portals have gained importance in many companies. In the latter instance, the enterprise information portals integrate, as a single point of access, various applications and processes into one homogeneous user interface. Today, such systems include a huge amount of content. They are no longer exclusively maintained by an information technology (IT) department. Instead Web 2.0 techniques are used increasingly, allowing user generated content to be added. These systems grow quickly and in a more uncoordinated way as different users possess different knowledge and expertise and obey to different mental models.
The continuous growth makes access to really relevant information difficult. Users need to find task- and role-specific information quickly, but face information overload and often feel lost in hyperspace. Thus, users often miss out on resources that are potentially relevant to their tasks, simply because they never come across them. On the one hand, users obtain too much information that is not relevant to their current task, on the other hand, it becomes cumbersome to find the right information and they do not obtain all the information that would be relevant.
An emerging technology allowing users (single users in reference to private tags, or user communities in reference to public/collaborative tags) to structure or categorize content autonomously to ease and “personalize” navigation through such large, complex information spaces, is tagging. The recent popularity of collaboration techniques on the Internet, particularly tagging and rating, provides new means for both semantically describing Portal content as well as for reasoning about users' interests, preferences and contexts.
In this context, tagging is the process of assigning keywords (or metadata) to resources. A tag itself is “some” metadata associated to a resource. Tags themselves are non-hierarchical keywords taken from an uncontrolled vocabulary. Further in this context, a resource is an entity uniquely identifiable (addressable). In other words, a tag is a (relevant) keyword or term associated with or assigned to a piece of information (a picture, a geographic map, a blog entry, a video clip etc.), thus describing the item and enabling keyword-based classification and search of information.
Tags are usually chosen informally and personally by item author/creator or by its consumer/viewers/community. Tags are typically used for resources such as computer files, web pages, digital images, and internet bookmarks (both in social bookmarking services, and in the current generation of web browsers—see Flock). For this reason, “tagging” has become associated with other Web 2.0 technologies.
Tags can add valuable meta information and even lightweight semantics to web resources. Tag clouds represent the visual depiction of tags available in the system. Rating is the evaluation or assessment of something, in terms of quality (as with a critic rating a novel), quantity (as with an athlete being rated by his or her statistics), or some combination of both. That is, it is the process of assigning (e.g. numeric) “values” to resources indicating how much people “like” those. A rating itself is “some value” associated to a resource. Ratings themselves are chosen from an interval of possible “values” whereas the one end of the interval usually refers to “dislike” and the other to “like”.
FIG. 1 gives an impression of the problem illustrating the most basic structural components of a prior art hardware and software environment used for a prior art tagging-based method when searching for some content. As shown in FIG. 1, web client 10 (one or more of a large plurality of them) cooperates with a web server 12 during his search for selected content. Tags 14 are used for characterising the content. There are many resources 16A . . . 16N available for being accessed by the searching person using his client 10 client with a respective Web Browser. Only accidentally the user finds the best suited content, because the tags 14 are often not selective enough for the searching user.
Many problems modern tagging systems deal with are related to synonymy and polysemy. Synonymy describes that fact that multiple tags can have the same meaning, either because they are a morphological variation (apple vs. apples) or a semantically similar (baby vs. infant). Polysemy describes tags that can have multiple meanings (e.g., apple can be the fruit or anything about the company Apple).
Today, systems try to overcome these things by applying stemming and normalization algorithms which most often only solve the problem with morphological variations.
Modern systems sometimes also leverage thesauri (e.g., WordNet) to overcome the issue with semantically similar terms.
The drawbacks of existing prior art can be summarized as follows:
Users can search for tags in prior art systems. What these kind of searches lack is an “a—priori indicator” about whether the search is effective at all and returns search results that are of high value to the searching person. A user searching for TAG_A has no clue about whether this search makes sense, neither about the fact that searching for TAG_A, TAG_B and TAG_C together would be very effective for him. Given the latter fact was known to the user, in prior art systems the user has to assemble the search over and over again by typing these tags or clicking multiple types which gets the more annoying the more tags are necessary for searching. Further, prior art search methods are language dependant, and polysemies can seldom be detected by prior art search and tagging methods.