The present invention relates generally to information management and, more particularly, to management of information in electronic documents including electronic mail (xe2x80x9ce-mailxe2x80x9d).
More and more applications have the requirement to store information about the information they handle. Such xe2x80x9cinformation about informationxe2x80x9d is commonly known as metadata. Metadata can be as simple as the size of a file or as complex as the entire engineering history of a computer-aided design component.
There are several ways in which current systems represent metadata. Attributes, or properties, are a common way of representing metadata. The other common approach is to use conventional database management systems to store the metadata as structured data. More recently, so-called semi-structured data techniques (typically based on the standard Extensible Markup Language, XML) have been applied. We now discuss each in turn.
Direct system support for attribute-structured metadata is not new. File systems have traditionally managed and exposed file and directory metadata. Some recent consumer computer products, such as the BeOS file system (BeFS) and the Windows 2000 file system (formerly NTFS 5.0), support arbitrary attributes assigned by user-written applications. Directory servers (e.g., LDAP servers) support attributes for entries like employees, printers, offices, etc. WebDAV, an extension to HTTP, allows arbitrary properties to be associated with HTTP resources and these properties to be stored at a web server and later retrieved through queries. More document-specific application infrastructure software, such as search engines (e.g., Verity) and corporate portals (e.g., Plumtree), allow attributes to be associated with the documents they handle. Finally, description logics, which have been used as knowledge inference tools, have relied on attributes as the basis to make inferences about concepts and particular instances of concepts.
Database systems have traditionally provided support to manage an application""s structured information. The primitives to handle the information and the degree of structure provided vary somewhat between the two most common types of database systems, relational database management systems (RDBMS), and object-oriented database management systems (OODBMS). However, both RDBMSs and OODBMSs are generally used to build applications where the structure of the tuples or objects needed for the application are well-understood before the application is deployed. The structure, or schema, is defined through some data definition mechanism and then strongly enforced at run-time.
More recently, semi-structured database management systems have begun to support persistent storage for information that has some identifiable, but possibly deeply-nested, structure which does not necessarily need to be defined in advance. A substantial part of the effort in developing semi-structured databases has gone into dealing with.queries over the evolving, deeply-nested structures. One particular application of XML databases is for documents, whose structure is defined by a DTD (document type definition). Another application seems to be in support of website definition, where the evolving structure of the database results from changes made by website managers to provide different ways for navigating the site.
A few examples of where costs and difficulties reside in current systems:
Plumtree""s deployment manual says: define all attributes before populating the portal, since adding one new property definition requires the whole portal to be re-indexed.
Verity also needs to re-index every document upon addition of a new attribute to a server, it is not even specific to document types.
For most of these systems, changes in the type, names or number of attributes managed by the system have to be introduced by system administrators, not application developers. In fact, schema changes in an RDBMS are costly and not performed often in practice. Object-oriented databases require support for type/class evolution of stored objects, which is uniformly poorly-supported. More document-centric infrastructure systems, like those provided by Plumtree and Verity, also only allow system administrators to add attributes to the system (and then incurs in the re-indexing costs described before.)
WebDAV, on the other hand, supports a much more flexible and dynamic model of how attributes can be attached to HTTP resources, however they force a partition of the property space that will not support fluid integration overtime. For example, if a number of attributes exist on a resources, e.g., author and subject, a new application wishing to reuse those attributes in addition to an attribute named distributionList, to create an xe2x80x9celectronic updatexe2x80x9d service cannot just combine these three attributes into UpdateService (author, subject and distributionList) without affecting other applications and reusing and leveraging exactly the pre-existing fields.
XML databases support easy addition of new branches in the tree structure (or new links in the graph). For example, adding a new element to a particular person element, within a particular office element, in a particular organization element, where the new element represents a xe2x80x9ctemporary replacementxe2x80x9d is supported by XML databases. However, XML databases do not support recasting of elements for a different purpose very effectively. For example, if you want to re-use some elements and add new ones into a group element, this can require quite a bit of work: first each of the elements to be reused needs to have an object identifier explicitly stated in the structurexe2x80x94if the creators did not add one you may be out of luck; second, the re-used elements are actually of a different kind than the newly created ones. That said, there is a distinction between the original semi-structured data work and the actual XML implementations.
While RDBs and OODBs enforce strong typing of each object and attribute, WebDAV and XML databases do not, which requires type checking by the application each time a value is retrieved with a query. Therefore, the flexibility of WebDAV and XML databases comes at a cost of constant correctness checking when developing applications against them.
Relational databases, although able to handle individual items (in relational database design terms, entities that are represented by unique keys), their existence independent of relationships specified through additional attributes in tuples, including the unique key, is not as useful. However, because of the networked nature of today""s intranets and the Internet, more and more items have some representation external to the system storing meta-information and the item can be quite meaningful to the content of the database independently of whether and how many or how few attributes are specified for it. OODBs, semi-structured DBs, and directory servers are very similar to RDBs in this way. Primitive support to access external information related to the item if needed is important particularly for document-centric applications, and is not inherently supported by many of these systems.
WebDAV, Verity and Plumtree, because of their more Web- or document-centric nature, do provide support for independent items that may exist completely independently from their attributes. For example, web pages in WebDAV have meaning independent of their attributes. However, these systems provide much less support for application development. They provide limited support for maintaining consistency of the attributes and they provide very little support for attribute evolution.
A system, method and article of manufacture are provided for managing data items. One or more roles are defined with each role comprising a set of attributes. The roles may then be then associated with zero, one, or more data items. The data item may be assigned a value for each of the attributes of the role(s) that have been associated with the data item.
In one aspect of the present invention, the set of attributes of a role may comprise one or more attributes. In an additional aspect of the present invention, roles may have one or more attributes in common with each other.
The data item may also have one or more attributes not included in the set of attributes of the role(s) associated with the data item. In another aspect of the present invention, the data item may have at least one attribute associated with it prior to the associating of the one or more roles to the data item. Also, the associating of one or more roles to the data item may further include the making of a determination as to whether the data item has one or more attributes in common with the set of attributes of the role.
In one embodiment of the present invention, the addition of one or more attributes to the data item may be permitted subsequent to the assigning of the role to the data item. In a further embodiment, the creating of an alias for at least one of the attributes of the set of attributes of the role may be permitted. In another embodiment of the present invention, the values assigned to the data item may be stored in a database. In such an embodiment, the values of all of the attributes of a role may be retrieved from the database upon receipt of a query for one of the attributes of the respective role.