The present invention relates generally to storage systems and, more particularly, to system and method for providing a highly available search index with storage node addition and removal in a replicated object storage system.
The amount of digital content is growing at an exponential rate and requires substantial storage systems to store and manage the content. Much of the content is unchanging content (fixed) and can be stored on a lower cost storage system. With all this fixed content being stored, it increasingly becomes important to be able to locate the content based on content metadata criteria. A large scale search index engine can be implemented by establishing horizontal partitioning of the index content where each node contains parts of the index. This methodology is called sharding. These shards are distributed to one per each node participating in the index database. This has the benefit of distributing the load of a very large index across multiple nodes.
When a cluster node is added or removed from the cluster environment or a shard becomes unavailable, the full index must be rebuilt to redistribute the index records within the new number of shards to facilitate a valid hashing algorithm used to identify the shard for specific index content. Re-indexing could take a long time and make the index unavailable for the duration. Additionally, when a shard of the index becomes unavailable either from a node being down or corruption of an individual shard, the full index must be restored from a backup (or replicated) copy or the index must be deleted and regenerated. To complete the index recovery again can take a very long time.