Classification of data involves identifying which of a set of predetermined classes a given data item is associated with. An example of data classification is the classification of images of handwritten digits according to which digit they represent. Probabilistic classification involves estimating probabilities of a given data items being associated with each of a set of predetermined classes. In recent years, significant developments in neural networks have resulted in increasingly accurate and efficient machine learning methods for classification of data. Alongside neural networks, computational methods based on Bayesian statistics have been developed and applied to classification problems. Gaussian process (GP) models are of particular interest due to the flexibility of GP priors, which allows for complex nonlinear latent functions to be learned. Compared with neural network models, GP models automatically result in well-calibrated uncertainties in class probability predictions. GP models are particularly suitable for low-data regimes, in which prediction uncertainties may be large, and must be modelled sensibly to give meaningful classification results.
Application of GP models to classification of data is typically computationally demanding due to non-Gaussian likelihoods that prevent integrals from being performed analytically. For large datasets, nave application of GP models to classification problems becomes intractable even for the most powerful computers. Methods have been developed to improve the tractability and scalability of GP models for classification.
In addition to the tractability issues mentioned above, most GP models rely on rudimentary and local metrics for generalisation (for example, Euclidian distances between data items), and accordingly do not adequately capture non-local generalisation structures. Recently, a method was proposed to incorporate convolutional structure into GP models (Convolutional Gaussian Processes, van der Wilk et al, 31st Conference on Neural Information Processing Systems, 2017). The proposed method results in translational invariance across input dimensions of the data to be classified, allowing for non-local generalisation.
Introducing a convolutional structure as described above improves the performance of GP models for classification, but does not adequately capture information depending on the structural relationship between input dimensions in high-dimensional data (for example, the relative locations of pixels within an image). As a result, existing GP models are typically outperformed in classification tasks by state-of-the-art neural networks which are capable of capturing such structural relationships.