This disclosure relates to a system and method for implementing a computer data structure, and in particular a content management system implemented in a log-based data storage.
Content management system (CMS) is a type of computer software that is frequently used for editing, storing, controlling, versioning, and publishing content such as web sites, news articles, operators' manuals, technical manuals, sales guides, and marketing brochures. Examples of common CMS software include Documentum by EMC Corporation, Alfresco by Alfresco Corporation, SharePoint by Microsoft Corporation, and FileNet by IBM. The type of files managed by CMS software may vary greatly, and may include text files, source codes, image media, audio files, and other electronic documents.
CMS's are usually implemented in relationship databases. A relational database is a common form of data storage that is usually implemented with data tables and associated indices. In addition, many relational databases keep track of data operations in transaction logs to allow for recovery and rollback in case of failures. However, these standard database components may suffer from poor performance due to their inherent inefficiencies. For example, the B-tree data structure that is commonly used as the lookup index is known to waste space (some implementations require up to 30% empty space). In addition, a search in a B-tree-based index can sometimes require logarithmic time. Moreover, write functions in these databases usually require many disk seek operations to overwrite existing data. As such, time consuming searches may need to be performed for both write and read operations. Finally, almost all relational databases use proprietary file formats, making tasks such as backup, integration and maintenance difficult and expensive.