The ease of authoring and uploading video or other content (e.g., images, audio, etc.) to the Internet creates a vast resource for computer vision research, particularly because Internet videos or other content are frequently associated with semantic tags that identify visual concepts appearing in the video or other content. However, since tags are not spatially or temporally localized within the video, such videos cannot be directly exploited for training traditional supervised recognition systems.