Bitmap based file systems such as GFS2/ext2/ext3 tend to have an on-disk structure in which the disk is divided into sections, each section of the disk having a set of metadata which is used to indicate block allocations within that section. In GFS2 such a section of disk is called a resource group. In ext2/3 such a section of disk is called a block group. Each block within the section is represented by one (or more) bits in the allocation bitmap for that particular section. The reason for partitioning the disk into sections is to help reduce fragmentation by allocating blocks for inodes from the same block group as the other blocks for that inode (assuming no other constraints come into play such as the inode being larger than a single such section of disk or a particular section of disk being full).
In GFS2 the partitioning plays an additional role in that each section of disk is given a unique lock which then allows one node in the cluster exclusive access to that part of the disk. By having a number of sections of disk which is much greater than the number of nodes in the cluster, the likelihood of two nodes wanting to allocate from the same section of disk at the same time is minimized. In GFS2 the allocation bitmap consists of two bits per block which can take the following states: 1) free block; 2) allocated block; 3) allocated inode (in use, n_link>1); and 4) unlinked, but still open inode (in use, n_link==0).
In addition, each inode in the file system has its own set of metadata which completely defines the blocks in use by that inode. In the GFS2 case that metadata takes the form of an equal height tree of block pointers. However, it takes a lot of disk space to encode a large tree. Since inodes tend to get written to disk over time, the metadata blocks often land up being scattered over the disk with the penalty of a disk seek between each block access. We also want to efficiently support file system shrink. This requires identifying all the inodes in a certain section of the file system (the last N sections of disk) where N>=1 and then moving them into free space towards the start of the disk in order to free up the last N sections of disk prior to shrinking the block device. The current metadata makes this very difficult to do because in order to identify which inodes are using any allocated blocks in the last N sections of disk, it would potentially require scanning all the metadata trees of all inodes in the file system.
Further, since there are two ways for the file system to indicate that a particular block is allocated (via the allocation bitmaps and also via a pointer in an inode's metadata tree) we have a situation where, if there is file system corruption, we might land up with conflicting information which then has to be resolved by a file system checker. Such checkers are very slow on large file systems.