In general, supervising a workplace or a public place usually needs multiple surveillance apparatuses, and the allocation of the surveillance apparatuses for various scenes is performed to clearly capture the image of each corner in the workplace or public place. For a conventional surveillance system, videos captured by these surveillance apparatuses are usually displayed on respective screens at the same time. For example, a supervisor may monitor a workplace or a public place using nine display screens arranged in a 3 by 3 grid, and each of the display screens displays a video image of a respective scene.
Actually, this is not a most efficient way having a single supervisor simultaneously watching multiple display screens. For example, if a supervisor discovers a person of interest appearing on a certain display screen and this person of interest is moving toward a different location, the supervisor has to memorize a number or location of the display screen associated with this different location in order to pay attention to the proper display screen in real time. It is obvious in such a conventional surveillance system that the supervisor cannot conveniently supervise the movement of the person of interest. Also, if such a person of interest brings about danger during a couple of seconds while the supervisor's attention is diverted, the supervisor cannot assess the situation in real-time and determine a proper plan of action, because of missing some important video images captured during this period.