1. Field of the Invention
The present invention relates generally to enterprise content management systems, and, more particularly, to enterprise content management network-attached systems.
2. Description of the Related Art
Managing electronic files and data is a fact of life for modem business operations. Businesses are becoming increasingly dependent on various electronic content creation and communication tools—such as the word processors, web applications, emails, image and video applications, and various databases, just to name a few—to conduct their business operations.
One consequence of this growing dependence on electronic information and communication tools is wildly proliferating digital data. It is not uncommon these days to find several billion bytes (gigabytes) of data on home computers used for personal purposes. In the business world, having several trillion bytes (terabytes) of data is not unusual, and some of the largest enterprises maintain more than petabytes (million billion bytes) of data on a regular basis. Thus, there exist needs among modern enterprises for systems which can store and manage very large amounts of data. At present, such needs are met by various interconnected storage systems such as the Network Attached Storage (NAS) and Storage Area Networks (SAN). The NAS systems are storage systems connected over existing computer network technologies such as the Ethernet or IP networks. The SAN (Storage Area Networks) systems connect storage systems over dedicated Fiber Channel network connections. Currently, vendors such as EMC and Network Appliance offer various NAS and SAN solutions.
However, being able to store a very large amount of data creates yet another type of problem, as the large quantity of data by its very nature gives rise to the proverbial “needle in the haystack” problem. As the size of the data in the storage grows, the task of finding and retrieving the exact data and information needed for a particular purpose becomes more and more challenging. It is often said that the success of modem enterprises depends on finding the right information for the right person at the right time.
The problem of finding “a needle in the haystack” is compounded by the existence of dissimilar data types which are managed by different applications with different access and/or search methodologies. Some files are managed through conventional file systems, emails are organized and accessed with email applications, and various databases are managed with their own database application programs. Thus, with conventional tools, it is often necessary to search several places with different applications in order to find the right set of information for a given task. In addition, access privilege and security mechanisms are typically quite different across the various content management tools. These problems make the task of “finding the right information for the right person at the right time” difficult for modem enterprises.
In order to address these problems, various Enterprise Content Management (ECM) systems have been introduced. A key component of any content management system is metadata. Metadata is “data about data”—i.e., data or information about a particular item of data. For example, metadata about a document file may be the author, creation time and date, last modified time and date, keywords describing the content of the documents, the file type, the application associated with the file, and access privileges for various users. Content management systems create and manage metadata for each item of data or information entered into the system and maintain a database or repository that contains the metadata. Thus, with a content management system users can locate desired information by searching for relevant attributes or keywords in the metadata database. Utilization of metadata also allows organizing, sorting, and selectively presenting data items based on relevant attributes stored in the associated metadata.
Metadata can be used, however, for much more than organizing, searching, and retrieving information. Since metadata are just another type of data, any information about the content data can be stored in the associated metadata. Stated generally, metadata can contain “intelligence” about the associated item of data. This intelligence can include: applications associated with the data, what operation can be performed on the data, the history of processing on the data, and what processing should be performed next on the data. Thus, metadata can be used to encode the entire life-cycle specifications for the associated content data. Based on such technologies, the capabilities of content management systems can be extended beyond organization and retrieval of information to management of business processes performed on the documents, information, or data. Systems with such capabilities are called Business Process Management (BPM) or Work Flow Management (WFM) systems. The metadata in this context are also called business process data. At present, vendors such as FileNet, IBM, Documentum, and Vignette provide various content management and/or business process management products or services.
One particular area where the content management systems are heavily relied on is so-called “fixed content” systems. In some business environments, enterprises are required to maintain data or records that cannot be modified due to the nature of business or government regulatory requirements. Examples include medical records and diagnostic medical images (such as X-ray and MRI images) for healthcare and insurance industry, accounting records and corporate documents for corporations, and security transaction records for brokerage firms or investment banks. For hospitals and healthcare providers, it is important to maintain X-ray and MRI images that cannot be modified due to concerns about accurate diagnosis and keeping accurate medical histories. Various government regulations such as Sarbanes-Oxley, HIPAA (Health Insurance Portability and Accountability Act), and SEC Rule 17a-4 require corporations, insurance companies, and securities brokers to maintain records that are guaranteed against modification for an extended period of time. For these types of records and data, the storage and content management problems are particularly acute due to the fact that the size of overall data must inevitably grow since the records cannot be deleted or modified and must be kept for an extended period of time. At present, vendors such as FileNet, EMC, and Network Appliance offer various fixed content storage systems, while FileNet, IBM, and Documentum provide content management solutions for enterprise record management needs and compliance requirements.
Despite multitudes of existing products and systems, there exist opportunities for improvements among currently available enterprise information management solutions. One of the shortcomings of existing solutions is lack of integration among various components of enterprise content management and storage systems. For example, as shown in FIG. 1, an enterprise content management system typically comprises several separate component systems: namely, content server cluster (110), content data storage (112), content metadata server cluster (120), content metadata storage (122), business process server cluster (130), and business process data storage (132). An enterprise content management system is typically deployed in a local area network (140) environment, and accessed from workstations (150) or personal computers (160). With existing solutions, these component systems are usually provided by different vendors, and are installed, maintained, and operated separately. In addition, the applications or software that provide the content management and business process management functions operate as distinct components separate from the storage systems.
Since there is currently no single integrated solution that provides the entire spectrum of process management, content management and storage capabilities, the enterprises must integrate various components of the enterprise information management system manually. Often the various components are provided by different vendors. This lack of integration causes several serious problems.
The first is difficulties in system administration including installation, configuration, and upgrade management. Since various components of enterprise information management systems are not aware of each other, they must be installed and configured separately. More often than not, the installation and configuration procedures and tools are quite dissimilar from each other, requiring the operators and administrators to learn and remember the dissimilar procedures and methods. Furthermore, since the configuration information for the entire system is not (and cannot be) maintained by any single component, the system-wide configuration data must be maintained manually through a process that is external to the system, rendering the management of system-wide configuration information an error-prone, complex, and difficult process.
In addition, upgrades of various components of the enterprise system tend to occur in a manner that is completely unrelated to each other, as the components are produced by different companies with entirely unrelated upgrade or product release schedules. This leads to difficult problems of timing upgrades, upgrade synchronization, and system-wide version maintenance and management. In particular, system-wide version information must be maintained, as interaction between different versions of software and/or hardware sometimes can result in unexpected problems which are difficult to track down. However, as was the case for system-wide configuration information, system-wide version information must be maintained external to the system for exactly the same reasons, leading to similar system administration difficulties. Thus, there exists a need in the field for an integrated enterprise content management system with integrated procedures and tools for installation, configuration, upgrade, and version management.
The second type of problems that arises from mixing and matching components from different vendors is dissimilar and incongruent semantics among various system components. Because information systems deal with intangible objects, the design and architecture of an information system are inherently based on abstract concepts. Thus, the “ontology” of an information system—what it is and what it does—is a direct result of the design principles, conceptual building blocks (“primitives”), and architectural framework employed by the system designers and architects. Naturally, there are competing design principles and paradigms, and designers and architects of information systems do not think alike. As a result, information systems from different design teams, e.g., from different vendors, tend to look and operate quite differently from each other. Thus, mixing and matching components from different vendors quite often involve translating and mapping dissimilar objects and concepts across the system boundaries. For example, most enterprise content storage systems offer primitives for storing content data and associated content metadata. However, the semantics of the primitives may not be entirely consistent with the semantics of the enterprise content management or business process management systems that operate on top of the storage primitive layers. Dissimilar and incongruent semantics across the system boundaries can sometimes lead to fundamental system problems with adverse consequences. Often, system-wide instability can be traced to inherent instability in system integration due to inconsistent and incongruent semantics across the system component boundaries.
Furthermore, dissimilar and inconsistent semantics can result in difficulties in system administration, as it is difficult to handle several different semantics and conceptual frameworks at the same time. Consistent semantics, conceptual framework, and design paradigms across the system hierarchy are essential and fundamental requirements for a stable and robust enterprise content management system. Thus, there exists a need in the field for an integrated enterprise content management system that presents a unified paradigm across the system hierarchy with unified and consistent semantics and conceptual framework.
Some enterprises have attempted to address the problem of dissimilar semantics by employing so-called “Content Bridge” technologies. Content bridges, such as VeniceBridge from Venetica, provide tools to integrate disparate systems by mapping and translating dissimilar logical units, data dictionaries, metaphors, and taxonomy into a single, consistent framework. Although content bridges provide useful tools that can improve the quality of system integration, several problems remain. The first is that some inconsistencies across disparate systems simply cannot be resolved. The second, and a more serious problem is that content bridges do not and cannot address the issues of replication and disaster recovery, since content bridges focus only on the conceptual problems of integrating dissimilar semantics of disparate systems.
For enterprise systems comprising disparate components from different vendors, replication and disaster recovery must be done separately for each component system. However, it is critical that replication and recovery transactions are synchronized across the entire hierarchy of system data layers. From the low-level storage data of NAS and/or SAN storage systems, to the content data, business process data, associated metadata, and the database that maintains the association information, the entire data set must be perfectly synchronized for the system to be operational and replication and recovery be effective. If these subcomponents become “out of sync” from each other, the entire data set may become meaningless, rendering the enterprise system useless. Nevertheless, existing products and technologies do not provide mechanisms to synchronize replication and recovery with external systems. Thus, currently, system-wide synchronization must be performed manually, and, when the replication and recovery transactions become “out of sync”,the transactions must be reconciled through a manual process. These are difficult and frustrating processes that are also error-prone.
The problem of replication and recovery synchronization is especially acute for “fixed content” systems discussed above. Because the content data in fixed content systems must be guaranteed against changes while the metadata and association database must be dynamic (i.e., modifiable), the entire data set cannot be stored on the same storage system and backed up together by brute force by “imaging” the entire system. Thus, for currently available fixed content systems, synchronization of replication and recovery must be done manually, with all of the accompanying problems described above.
For enterprise systems, the importance of replication and disaster recovery cannot be over-stated. Because many modem enterprises depend critically on their enterprise information systems, the enterprise content management systems for those organizations must be mission critical, high availability systems. For these systems, system-wide replication and disaster recovery are essential functions in providing high availability capabilities. Even when high availability is not required, replication and disaster recovery are important in order to provide business continuity protection. Replication and disaster recovery are essential functions to ensure reliability and robustness of enterprise systems. Thus, there exists a need in the field for an integrated enterprise content management system that provides synchronized replication and recovery of all of its components including the content management, business process management, and database and content storage management systems.
Another problem that comes from integrating disparate components from different vendors is performance degradation. Typically, integration is achieved by utilizing vendor-provided interface layers. Alternatively, components may be integrated employing third-party integration tools such as the content bridges described above. In either case, performance degradation is inevitable whenever interface layers are introduced to a system, because the system needs to perform extra steps in order to process data through the interfaces or bridges.
It can be seen, then, there is a need for an integrated enterprise content management network attached system that provides a unified approach to the entire range of enterprise content management functions from storage management to content and business process management.