The following relates to the statistical sampling arts and related arts, and to arts employing statistical sampling.
A diverse range of problems can be formulated in terms of sampling of a space or domain (represented herein without loss of generality as X a sample of which may be denoted as x) in accordance with a target distribution, which is represented herein without loss of generality as p(x), which may or may not be normalized. For example, many structured learning problems can be naturally cast in terms of decision sequences that describe a structured object x, which is associated with an unnormalized probability or distribution p(x). In some such learning problems, p(x) is the exponential of a real-valued “energy” function, which is analogous to the “value” function in optimization problems. In such situations, both inference and learning give a central role to procedures that are able to produce samples for x that follow the normalized probability distribution associated with p(x).
A known approach for performing such sampling is a technique called rejection sampling. In this approach, sampling is performed in accordance with a proposal distribution, which is represented herein without loss of generality as q(x), where the overbar indicates a normalized distribution. Rejection sampling is premised upon finding an upper bound β for the “density ratios” ρ(x)≡p(x)/ q(x) over the whole of the sampling space X, to sample x˜ q (that is, sample x according to q), and to accept x with probability p(x)/(β q(x)). The average acceptance rate γ is equal to p(X)/β (which is always ≦1), where p(X) is the total measure of X relative to p (the partition function of p). Efficient sampling is obtained by employing a proposal distribution q(x) that is a good approximation of the normalized target distribution p(X), and so the choice of q(x) is of substantial concern.
However, in practice it can be difficult to obtain a proposal distribution q(x) that closely approximates q(X). If q(x) is a poor approximation of p(x) then many samples obtained in accordance with the proposal distribution q(x) are rejected by the target distribution p(x), leading to poor sampling efficiency.
In adaptive rejection sampling (ARS), the rejected samples are used to improve the proposal distribution. ARS assumes that the target distribution p(x) is concave, in which case a tangent line at any given point on the target distribution is guaranteed to define an upper bound. This concavity aspect is used in ARS to refine the proposal distribution q(x) based on rejected samples. See Gilks et al., “Adaptive Rejection Sampling for Gibbs Sampling”, App. Statist. vol. 41 pages 337-48 (1992).
ARS is applicable to log-concave distributions in which the logarithm of the target density function p(X) is concave. Görür et al., “Concave Convex Adaptive Rejection Sampling”, Technical Report, Gatsby Computational Neuroscience Unit (2008) (hereinafter “Görür et al.”). Görür et al. sets forth an improved ARS that is applicable to distributions whose log densities can be expressed as a sum of concave and convex functions, which expands the scope of applicability of ARS. Nonetheless, even with this improvement the ARS technique is generally limited to a target distribution p(X) that is continuous in one dimension. This is a consequence of ARS relying upon piecewise linear upper bounds that are refined based on rejected samples and that are assured of being upper bounds on account of the continuous curvature between the end points. ARS techniques are therefore difficult or impossible to adapt to sampling of a complex target distribution p(X), such as a multi-dimensional target distribution, a discrete target distribution, discrete multi-dimensional target distribution, a highly discontinuous target distribution, and so forth.