Machine learning techniques may be used to identify and/or contextually categorize items of interest depicted in digital media content and/or other digital information. For example, digital media content may comprise digital images. Items of interest may comprise objects detected in the images. Some techniques may include one-shot learning, zero-shot learning, open set recognition, visual-semantic embedding, and other techniques. One-shot learning techniques may be configured to learn object categories from one, or only few examples. The examples may be referred to as “training data.” To compensate for the lack of training data and to enable one-shot learning, knowledge may be obtained from other sources, for example, by similarity of features, semantic attributes, and/or other information. Zero-shot learning may be configured to recognize novel categories of detected object with no training data by obtaining knowledge from auxiliary categories. For example, zero-shot learning may explore the use of attribute-based semantic representations. An attribute vector prototype of each category must be pre-defined which may be very computationally expensive for a large-scale dataset. In some instances, semantic word vectors may be used to embed a given class name without human efforts; they can therefore serve as an alternative semantic representation. Open set recognition may identifying whether an image belongs to seen or un-seen categories. However, open set recognition may not be able to identify the descriptions of the un-seen categories. Visual-Semantic embedding may provide a mapping between visual features and semantic entities. The mapping may be directly learned by one or more of regressing visual features to the semantic space using one or more of Support Vector Regressors (SVR), neural network, and/or other techniques; projecting visual features and semantic entities into a common new space such as SJE, WSABIE, ALE, DeViSE, CCA, and/or other techniques; and/or by other methods. However, one or more techniques may be limited by the information with which they may be conditioned on.