A well known problem in real-world applications of machine learning is that expert labeling of large amounts of data for training a classifier (a function that maps sets of input features or attributes to classes) is prohibitively expensive. Often in practice, only a small amount of data is available and as a result the amount of labeled data is far from adequate. In this case, making an adequate estimation of the model parameters of a classifier is challenging.
Further, the underlying assumption in traditional machine learning algorithms is that instances are independent and identically distributed (“I.I.D.”). This assumption simplifies the underlying mathematics of statistical models, but in fact does not hold for many real world applications. Such models constructed under the I.I.D. assumption only leverage relationships between attributes (meaning a specification that defines a property) within instances (e.g., co-occurrence relationships), and do not model connections between the attributes in different instances. A well-known example is market basket analysis, which forms sets of items that are purchased together. In supervised learning, classification of a single instance of previously unseen data is thus possible because no additional context is needed to infer class membership. However, such a context-free approach does not exploit valuable information about relationships between instances in the dataset.