Some modern file systems use objects to store file data and other internal file system structures (“metadata”). A file is broken up into many small objects, perhaps as small as 4 KB (2^12 bytes). For a file system that spans 64 TB (2^46 bytes), for example, this results in over 2^(46−12)=2^34, or roughly 16 billion objects to keep track of.
In this context an object is a sequence of binary data and has an object name, often a GUID (globally unique ID), or a cryptographic hash of the content, although other naming conventions are possible as long as each unique object has a unique name. Object names are usually fixed length binary strings intended for use by programs, as opposed to people. Object sizes are arbitrary, but in practice are typically powers of 2 and range from 512 bytes (2^9) up to 1 MB (2^20). Objects in this context should not be confused with objects as used in programming languages such as Java and C++.
An index (sometimes referred to as a dictionary or catalog) of all the objects is needed by the file system. Each record in the index may contain the object name, length, location and other miscellaneous information. The index may have as its primary key the object name, the object's location, or possibly both. A record is on the order of a few tens of bytes, 32 bytes being one example.
Operations on this index include adding an entry, looking up an entry, making modifications to the entry, and deleting an entry. These are all typical operations performed on any index.
Because these file systems work with objects, for the file system to obtain acceptable performance levels, an indexing solution has two challenges not easily met:                1) The number of entries in the index can be very large. In the example listed above, if each index entry is 32 (2^5) bytes, then the index takes 2^(5+34)=2^39, or 512 GB of memory. This does not fit cost effectively in current memory technologies.        2) The operations against the index are large. A commercially viable storage system may need to perform at, say, 256 MB/sec (2^28 bytes/second). At 4 KB object sizes, that is 2^(28−12)=2^16, or 64 thousand operations per second. Given that file systems typically generate and reference other data (objects) internally, the index operation rate can easily exceed 100 thousand operations/second. As a point of comparison, a current state of the art disk can do at best 400 operations per second.        
Achieving the necessary performance and capacity levels is not practical using DRAM memory technology, or disk technology, alone. DRAM memory is fast enough, but not dense enough. Disks have the density, but not the performance. Scaling either (DRAM memory or disks) to reach the desired characteristics is too expensive.
Object names are often uniform in both their distribution and access patterns, so typical caching schemes, which depend on spatial and temporal locality, have limited effect. Thus, the indexing problem is difficult in both size, and in operation rates.