Many applications are emerging where it is necessary to query large, structured, highly distributed datasets, such as in managing a wide-area network comprising a large number of computers, where each computer holds a subset of the data and may regularly update this locally stored data. In such large systems, the total amount of data may be very large (e.g. many TeraBytes of data distributed over 100,000+ geographically distributed computers) and a significant fraction of the computers (and hence the data) may be unavailable at any given time. Current solutions enable the dataset to be queried by replicating the data at one or more nodes and then querying this consolidated dataset. However, this is infeasible for very large datasets because it creates huge network overheads due to the transfer of the data to the central node(s).