At present, users may capture multimedia data such as videos and photos using a media capturing apparatus. For photos, face clustering technology may categorize multiple photos, in which the same person appears, into a photo album corresponding to the person. However, this technology is not available for clustering of videos and photos in which the same person appears. A user may manually categorize the videos. However, this manual method is inefficient and lacks intelligent technology.