Distributed computing and distributed database environments provide a structure for handling the challenges of “big data,” large volumes of data that require processing, usually in short periods. While allowing increased computational and storage resources to be applied to a big-data problem and enabling redundancies to minimize downtime of key capabilities, this distribution of processing and/or storage comes at a cost—the transfer of data between systems requires computational and backhaul (network) resources. As a result, a large-scale distributed design may be limited by distribution costs.