Enterprise Content Management (ECM) systems are facing a common challenge in dealing with growing data volume within a cloud computing environment. Distributed object stores and file systems are the underpinnings of content object storage. Many of those ECM systems are designed to leverage a cluster of commodity hardware. Each node in the cluster is a commodity server with many locally attached hard disks, which may be referred to as a storage node. The cluster scales horizontally as more storage nodes are added.
By analyzing the content objects typically stored in a content management system, a determination was made that a high percentage of the content objects are relatively small in size. This translates to a large number of small files (e.g., several kilobytes or less per file) for the content objects. However, the large number of those small files may have an impact to the scalability of the object storage because they cause a lot of overhead in handling file Input/Output (I/O) of small files.
Transactional ECM systems store small content objects into the underneath file systems in which the overhead of the index node associated with each file roots the small object problem, where an index node is a data structure used to represent a file system object. As a result of a large number of small files, a large number of index nodes cause overhead. Some archival ECM systems store only the aggregated larger files batch-packed from the small files outside of the repositories, which makes the archival ECM systems read-only and non-transactional.