The following relates to the machine learning arts and to applications of same such as multi-label classification, image denoising, and so forth.
In multi-view learning, an object can be described by two or more different feature sets. Each feature set corresponds to a “view” of the object.
By way of illustrative example, the object may be an electronic document, which may be described by a first feature set (first view) comprising a bag-of-words vector representing textual content of the document, and by a second feature set (second view) representing the document structure (its organization into books, sections, chapters, or so forth), and perhaps by a third feature set (third view) representing the images contained in the (illustrative multi-media) document, and so forth.
As another illustrative example, an object may be a three-dimensional human face, and the first view of the face may be a feature set describing a photograph of the face obtained for a certain pose and lighting condition, a second view of the face may be a feature set describing a photograph of the face obtained for a different pose and/or different lighting condition, and so forth.
As another illustrative example, an object may be an digitally recorded audio clip, and a first view may represent the digital recording characteristics (such as bit rate, sampling rate, or so forth) while a second view may represent audio characteristics (such as frequency spectrum, dynamic range, or so forth), while a third view may represent metadata associated with the audio clip (such as a title or filename, create date, and so forth).
As another illustrative example, an object may be the psychological profile of a person, and a first view may be results of a personality test, a second view may be results of a gambling addiction test, a third view may be results of a schizophrenia screening test, and so forth.
In a multi-view learning task, V views of a set of n objects can be represented in general fashion as a set of prediction matrices {Xk}k=1V for the V views, where in general the prediction matrix Xk has a dimension n corresponding to the n objects and another dimension dk corresponding to the number of features characterizing the kth view. Observations of the various views of objects obtained by experiments, tests, recording data available on certain objects, or by other means can similarly be represented in general as a set of incomplete observation matrices {Yk}k=1V where the observation matrix Yk analogously has a dimension n corresponding to the n objects and another dimension dk corresponding to the number of features characterizing the kth view. The observation matrices Yk are generally incomplete in that only a small sub-set of the n objects are actually observed, and/or not all views (or not all features of a given view) of an observed object may be available. By way of illustrative example, in the illustrative human face learning task, photographs of a given face may be available for only some poses, and/or for only some lighting conditions, and it may be desired to predict the features of one of the unavailable photographs of the face.
Disclosed herein are improved multi-view learning techniques that provide various advantages as disclosed herein.