Since the 1960's, the computer hardware and software industries have provided a relentless and spectacular increase in the capabilities and functionalities of computer-based data processing systems. For example, contemporary office workers are typically equipped with modem personal computers (“PCs”) that surpass, in processor speeds, memory sizes, and mass-storage capacities, supercomputers of only 20 years ago. Networking technologies allow PCs to be interlinked with one another and with powerful servers and other computational resources to provide extremely high-bandwidth interconnection between computer users, access by users to vast computational resources, and immense capacities for data storage and retrieval. Today, large and complex business organizations can easily implement highly interconnected, paperless work environments using relatively inexpensive, commercially available computer hardware and software products. However, as the capabilities of computer hardware and software have increased, the amount of data that is generated and computationally managed in business, commercial, and even home environments, has rapidly increased, and the rate of increase in data generation is itself increasing. Computer users may receive hundreds of emails each day, many including photographs, video clips, and complex, multi-media documents. Moreover, many computer users routinely generate large numbers of text documents, multi-media presentations, and other types of data. Much of this data needs to be managed and stored for subsequent retrieval. Recent legislation mandates, for example, reliable storage of emails and other electronic communications generated and received in certain business environments for lengthy periods of time, spanning decades. Although it is possible to purchase ever-larger mass-storage devices and ever-increasing numbers of servers to manage backup and archiving of electronic data on the mass-storage devices, the expense, management overhead, and administrative overhead of storing and managing the large amounts of electronic data may quickly reach a point of commercial and economical impracticality.
One solution to the above-mentioned problems is a new class of distributed data-storage systems. In these systems, a data object is a single routable data entity. An application-level data object may consist of one or more data objects. In certain of these distributed data-storage systems, compression-enhancing data-object routing is used to distribute data objects to component data-storage systems of a distributed data-storage system. Compression-enhancing data-object routing may involve computing a similarity key for each data object in order to, over time, route each group of similar data objects to a single component data-storage system. While compression-enhancing data-object routing techniques work well for relatively static distributed data-storage systems, they may be less satisfactory in certain dynamic, distributed, data-storage systems in which component data-storage systems become unavailable and new component data-storage systems are added to the distributed, data-storage system. For this reason, computer users, business and research organizations, vendors of computer systems and computer software, and various governmental organizations have all recognized the need for improved data-object routing in dynamic, distributed, data-storage systems.