There has been significant progress in the last few decades in image annotation to label and manage large amounts of visual content. However, existing approaches to image annotation are limited in the aspect of treating each label individually with a limited vocabulary.
Existing approaches require selection from pre-defined concepts or calendar events, which comprise the vocabulary for image annotation. These concepts are usually organized by a flat structure. Since the size of a flat vocabulary is usually limited, the annotations fail to provide a structured, comprehensive description for the images.
To enlarge the vocabulary, one existing approach mines the World Wide Web (web) to collect additional labels from online data. However, the tags from web data are noisy and not properly labeled. Since these tags are labeled by a variety of users, it is impossible to expect that the labels are consistent in nature.