One aspect of knowledge management is knowledge re-use and sharing—the ability to find knowledge contained within an organization (or indeed a larger structure—e.g. the ability to find information available over the World Wide Web (WWW), to avoid needless duplication of effort. A key part of knowledge re-use and sharing is the organization of information objects or electronic documents (e.g. Word documents, Intranet pages, PowerPoint slide sets) so that objects relevant to the user's current information needs can be more easily retrieved. This is often done by means of a taxonomy or ontology. A taxonomy or an ontology may be considered as a hierarchical set of categories into which information objects may be classified, making it easier for the user to find those documents relevant to a particular query. Such taxonomies and ontologies are typically based on formal mathematical languages, allowing reasoning over the knowledge structure and an unambiguous, machine-processable representation of information.
Recently, trends towards less formal, massively collaborative, lightweight Web 2.0 tools (such as flickr and del.icio.us) have started to attract increasing attention from the enterprise world. In these schemes, in contrast to the conventional taxonomic/ontological approach, users are not required to classify information objects against a pre-defined corporate knowledge organization scheme but instead are free to define their own topics to associate with information objects (such topics are known as ‘tags’). Where multiple users tag a set of information objects, a folksonomy emerges. A folksonomy can thus be defined as a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content.
The advantage of informal collaborative tagging of information objects is that experience shows that users find it a very natural and low-cost (i.e. easy and convenient) way to categorize information. Typical enterprise knowledge management systems are based on repositories where information is stored and classified against a pre-defined classification scheme (taxonomy or ontology) for later retrieval. Whilst useful, such systems can be time-consuming to use, since they require knowledge of the classification scheme. Furthermore, such formal, centralized systems tend to be changed only slowly and not in a way which is representative of the often fast-moving changes in conceptualization that users may have of a domain as new concepts emerge and as the relationships between existing concepts shift. Often, therefore, the formal knowledge structure ceases to be fit for purpose since its categorization scheme becomes outdated and no longer in tune with the user community's view of the domain.
The disadvantage of the folksonomic approach is that the benefits of the ontological or taxonomic approach are lost. These include the ability to perform ontological reasoning over the information objects and the inherently shared nature of the more formal approach: whereas in the formal approach a single category will be used to represent a given concept, in the tagging approach users are free to use whatever tags they wish and may use multiple tags to represent essentially the same concept. Furthermore, the hierarchical structure of a taxonomy or ontology will be lost. The hierarchical structure may be useful both for performing formal automated reasoning to enhance the operation of a search engine, and generally just to help a user conceptualize the repository as a whole and various “routes” through the repository to assist the user in “navigating” through the repository.
An approach has been developed for generating ontologies using a bottom-up approach involving automatically processing a set of documents. This approach is based on founding work by Rudolph Wille and his former student Bernhard Canter concerning Formal Concept Analysis (FCA). FCA may be described as a principled way of automatically deriving an ontology from a collection of objects and their properties. For example, a paper by Paolo Ceravolo et al. Entitled “Bottom-Up Extraction and Trust-Based Refinement of Ontology Metadata” describes an approach for automatically processing XML documents (or “objects”) and deriving from these hierarchical information which is combined with an “upper ontology” to generate an enhanced ontology. Feedback from users is then used to ascertain the correctness of the automatically generated ontology, using a sort of trust score assigned by users to the decisions made by the automatic tool. (N.B. the above discussed paper by Ceravolo et al. uses the term “tag” in a non-standard way to refer to non-leaf nodes within a tree representation of an XML document—in the present document the term tag is used in a more standard manner to refer to a classification term assigned to an electronic document by a user.)