When multiple video cameras capture a dynamic scene from different vantage points, it may be useful to correlate entities (e.g., inanimate objects, persons, etc.) among the video clips captured by the video cameras. For example, if two viewers of a soccer match each take a video of the match, determining that a particular player in the first video and a particular player in the second video are in fact the same player may be useful for a variety of applications (e.g., obtaining three-dimensional coordinates of the player's location, etc.).