The goal of feature learning is to represent an image numerically with a vector of floating numbers, so that visually and semantically similar images are close in the numerical feature space. Feature representation is the cornerstone for many functions on social media, such as image search, auto-tagging, recognition, detection, recommendation, etc. Traditional feature learning is based on meticulously labeled classification datasets such as ImageNet. However, learning with images from social media requires the handling of noisy and multi-facet labels from users. For example, on Behance, images are organized in projects owned by users, and the projects are further assigned to different fields and featured sites according to their styles and purposes. All the information about image-project association, ownership, field, and site categories can be regarded as labels for training, and they characterize images from different facets. However, these labels have very different structures and are often heavily corrupted by noise (for example, non-comparable taxonomies or syntax). This makes it difficult to apply the conventional classification-based feature learning. Using features trained on classification datasets is also unsatisfactory due to domain shift.