The following relates to the sampling arts, optimization arts, to applications of sampling such as sampling of hidden Markov models (HMMs), natural language processing (NLP) systems employing probabilistic context free grammars (PCFGs) augmented by constraints, and so forth.
A diverse range of problems can be formulated in terms of sampling of a space or domain (represented herein without loss of generality as X a sample of which may be denoted as x) in accordance with a target distribution, which is represented herein without loss of generality as p(x), which may or may not be normalized. A known approach for performing such sampling is a technique called rejection sampling. In this approach, sampling is performed in accordance with a proposal distribution, which is represented herein without loss of generality as q(x), where the overbar indicates a normalized distribution. Rejection sampling is premised upon finding an upper bound β for the “density ratios” ρ(x)≡p(x)/q(x) over the whole of the sampling space X, to sample x˜q (that is, sample x according to q), and to accept x with probability p(x)/(βq(x)). The average acceptance rate γ is equal to p(X)/β (which is always ≦1), where p(X) is the total measure of X relative to p (the partition function of p). Efficient sampling is obtained by employing a proposal distribution q(x) that is a good approximation of the normalized target distribution q(X), and so the choice of q(x) is of substantial concern.
However, in practice it can be difficult to obtain a proposal distribution q(x) that closely approximates q(X). If q(x) is a poor approximation of p(x) then many samples obtained in accordance with the proposal distribution q(x) are rejected by the target distribution p(x), leading to poor sampling efficiency.
In adaptive rejection sampling (ARS), the rejected samples are used to improve the proposal distribution. ARS assumes that the target distribution p(x) is concave, in which case a tangent line at any given point on the target distribution is guaranteed to define an upper bound. This concavity aspect is used in ARS to refine the proposal distribution q(x) based on rejected samples. See Gilks et al., “Adaptive Rejection Sampling for Gibbs Sampling”, App. Statist. vol. 41 pages 337-48 (1992).
Görür et al., “Concave Convex Adaptive Rejection Sampling”, Technical Report, Gatsby Computational Neuroscience Unit (2008) (hereinafter “Görür et al.”) discloses an improved ARS that is applicable to distributions whose log densities can be expressed as a sum of concave and convex functions, which expands the scope of applicability of ARS. Like conventional ARS, the approach of Görür et al. is generally limited to a target distribution p(X) that is continuous in one dimension. This is a consequence of reliance upon piecewise linear upper bounds that are refined based on rejected samples and that are assured of being upper bounds on account of the continuous curvature between the end points. Such techniques are difficult or impossible to adapt to more difficult problems in which the target distribution p(X) is multi-dimensional, and/or discrete, and/or highly discontinuous, or so forth.
Optimization is generally viewed as a problem that is separate and distinct from the sampling problem. Sampling endeavors to obtain a set of data points that is representative of (or, alternatively, in accordance with) a density function or distribution. In contrast, optimization endeavors to locate the maximum value of a function, which may or may not be a density function or distribution. The goal of optimization may be to find the highest value of the function, i.e. pmax(x0), or to find the spatial location in the space X of that maximum, i.e. to find the value x0.
Some functions may be optimized analytically, e.g. by finding the point where the derivative of the function goes to zero. More commonly, optimization employs iterative approaches, such as the gradient descent method or the Levenberg-Marquardt algorithm. In principle, sampling can be employed for optimization, for example by a Monte Carlo approach in which samples are acquired and used to estimate the maximum value. Such sampling approaches are approximate, and the error is generally expected to roughly correlate with the sample size.