Providing a good quality of service (QoS) to customers is an ultimate objective of any distributed system or network. QoS metrics are typically expressed as end-to-end performance characteristics such as network delay, connection bandwidth, or a round-trip time (RTT) of a transaction (e.g., Web transactions) between a source (e.g., a client) and destination (e.g., a server). Accurate estimates of end-to-end performance are vital to optimizing system performance objectives. For example, in content distribution systems (CDS) one wishes to route download requests to servers (mirror sites) with highest expected bandwidth. In overlay routing, and in distributed hash table (DHT) construction, one is interested in finding lowest-latency routes. A common objective in various kinds of distributed systems is to minimize violations of service-level agreements (SLAs) which typically stipulate penalties if a certain percentage of requests exceed a threshold delay or transaction RTT. In all such applications, it is essential to have accurate information about the end-to-end performance between various end points in the distributed system.
However, estimating end-to-end performance by exhaustive pairwise measurement is infeasible in large networks, and cannot be kept up-to-date in highly dynamic environments. Thus a natural alternative is to try estimating unobserved end-to-end performances from actual, and preferably much smaller, set of available measurements. For example, predicting network latencies has been an active area of research in the past few years, and a variety of approaches have been proposed. A common approach is to embed the network hosts into a low-dimensional Euclidean space based on the previously obtained measurements to a set of landmark nodes. Another approach, called Vivaldi, relies on an analogy with a network of physical strings, and tries to position the hosts so that the potential energy of a spring system is minimized. Finally, matrix-factorization approaches based on Singular Value Decomposition (SVD) and Non-negative Matrix Factorization (NMF), has been recently proposed. While the previous approaches listed above have performed well in interesting scenarios, they face some potentially significant practical limitations. For example, the assumption of Euclidean distance properties (symmetry and triangle inequality) underlying several approaches may often be violated in practice, as observed in various studies. Further, other methodologies are based on a strong assumption, which is shared by many current network distance prediction techniques, even those that avoid Euclidean assumption. Namely, it is assumed that for a given set of landmark nodes, all pairwise measurements among them and between the hosts and the landmark nodes are available. This assumption may not always be realistic, particularly for end-to-end performance measures that are either costly or impossible to obtain on demand, for example, forcing peers in a content-distribution system to upload or download files to all other nodes.
Moreover, it is often observed that the predictive accuracy of collaborative prediction from very sparse data can improve dramatically when more samples become available. However, excessive sampling can be costly. A user may become annoyed if she is asked to rate many products or a network may become congested if too many measurements are performed. Additionally, suggesting a product to buy or a server to download from has a high cost if the user does not like the product, or the download bandwidth turns out to be low. Therefore, there is a need for cost-efficient active sampling that would best improve the performance prediction accuracy while minimizing the sampling costs.