Document management systems often employ a search engine to allow for fast retrieval of information. A search engine can search the metadata and text of documents in a search index to determine which documents match search criteria without having to parse the document itself.
As the volume of information committed to a search system increases, a need arises to have multiple search systems sharing responsibility for managing the search index. The index needs to be split into smaller components, called partitions. Each partition has a capacity limit, based on resources such as available memory, disk space or other capacity constraints.
As partitions are filled with new data from indexing operations, they approach a point at which they are eventually deemed to be full. In a traditional system, the administrators need to monitor the sizes of the partitions, and make configuration changes as the partitions increase or decrease in size. This creates a system management burden, and can even result in a partition becoming inoperable if configuration changes are not made in a timely manner.
One solution is for the administrator of the system to check the conditions of the partitions on a regular basis. This introduces the prospect of user error, and is problematic if the administrators are not available.
Another solution is for external automated applications to regularly check the status of the partitions, and notify the administrators that action should be taken based upon configuration rules. The disadvantages here are the need for external programs to be created to monitor the partitions, and this still leaves room for errors if the administrator cannot react to the notifications in a timely manner.
If the search system provides suitable integration points, it may also be possible for an external system to monitor the status of partitions and make configuration changes automatically. This places a burden on external technology to anticipate the internal behavior of search partitions.
None of these solutions, however, provide for an understanding when a partition is too full and moving appropriate data from a full partition to one with available space.
There are implementations that exist today which are capable of moving data to other partitions. However, these solutions move data inefficiently, have only one mode of operation, and only move data once extreme limits have been exceeded. Consequently, there is always room for innovations and improvements.