Visual appearance based person re-identification (re-id) has been an active topic in the past few years, and will remain to be one in a foreseeable future. The re-id task involves assigning the same identifier to all occurrences of a particular individual captured in a series of images or videos, even when the occurrences are significantly different in time or space.
In existing technologies, given a probe image and a gallery set containing a list of persons-of-interest, image search may be performed to return ranked list of persons-of-interest, or a multi-class classifier may be trained on the gallery set. These approaches are mainly devoted to feature representation and distance metric learning that hopefully assumes invariance to appearance variations due to different camera views, significant gaps over time and space. However, existing image retrieval frameworks cannot handle well the intra-class variability and inter-class similarity without any heuristic constraints imposed.
Another approach involves modeling the re-id structure between a gallery set and a probe set, thereby inferring all the image labels in the probe set rather than labeling each probe image individually. Such re-id structures may be modeled as a bipartite graph or a Conditional Random Field (CRF). The structural construction of these models is either learned from large amounts of manually labeled image pairs (one gallery image and one probe image associated with the same person identifier), or handcrafted based on heuristics, e.g., edge topology.
However, manually obtaining strong re-id structure priors are prohibitively expensive and unavailable in practice. Further, handcrafted structures ignore the uncertainty nature of this statistical inference problem. In addition, there does not yet exist a principled approach to derive a common latent feature space for the re-id problem.
The disclosed method and system are directed to solve one or more problems set forth above and other problems.