Immersive telepresence systems are equipped with a cluster of cameras to create a life-size view of meeting participants across a conference room. Each camera has a fixed field of view (FOV) and captures a pre-defined seating segment within the room. The FOVs captured by the cluster of cameras cover non-overlapping adjacent ones of the pre-defined seating segments. When images from the camera cluster are displayed on abutting screens, the images appear as if taken from a single camera with a very wide FOV. To achieve this effect, the cameras must be carefully installed to ensure proper alignment, avoiding noticeable image duplication (overlap) as well as dead zones (non-realistic spacing apart) between adjacent images. This alignment of the fields of view is done by manually adjusting the cameras, which is a very tedious, time consuming and error prone process. While the relatively large bezels of screens used today may provide some tolerance to perceivable misalignment between adjacent images, accurate connection of images without noticeable defects between adjacent camera views becomes increasingly difficult as the screen bezels become thinner and thinner.