The efforts aimed at reconstructing three-dimensional dynamic scenes are especially significant in the fields of intelligent remote surveillance (Roth, P., Settgast, V., Widhalm, P., Lancelle, M., Birchbauer, J., Brandle, N., Havemann, S., Bischof, H.: Next-generation 3D visualization for visual surveillance. IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), 2011 pp. 343-348), video-communication and the so-called augmented reality systems. If real events are available in realistic 4D, i.e. a three-dimensional video flow varying in time, this provides the observer a substantially enhanced visual enjoyment compared to conventional video flows, because a reconstructed 4D scene can be watched from any viewpoint and can be altered virtually by the user. However, constructing an interactive 4D video system poses an extremely tough challenge, because it requires the automatic perception, processing and real time presentation of the environment simultaneously.
A so-called 4D reconstruction studio is an environment fitted with advanced intelligent sensors, and it uses several synchronised and calibrated high resolution video cameras and GPUs (Graphics Processing Units) to build a dynamic 3D, i.e. 4D, model, which provides a real time video flow with arbitrary viewpoint. A 4D reconstruction studio is described by way of example in the papers of Hapák J., Jankó Z., Chetverikov D.: Real-time 4D reconstruction of human motion. Proc. 7th International Conference on Articulated Motion and Deformable Objects (AMDO 2012). Springer LNCS, vol. 7378, pp. 250-259 (2012) and Blajovici, C., Chetverikov, D., Jankó, Z.: 4D studio for future internet: Improving foreground-background segmentation. In: IEEE International Conference on Cognitive Infocommunications (CogInfoCom). pp. 559-564. (2012). The described 4D studio is able to record and display efficiently the model of a single moving person, but it is not adapted for ‘covering’ and recording such large scenes, which have many moving persons and various background objects.
A paper (Kim, H., Guillemaut, J. Y., Takai, T., Sarim, M., Hilton, A.: Outdoor dynamic 3-d scene reconstruction. IEEE Trans. on Circuits and Systems for Video Technology, Vol. 22, pp. 1611-1622 (2012)) describes a portable stereo system adapted for the surveillance of outdoor scenes, which is able to make recordings of dynamic outdoor scenes and to spatially reconstruct of the scenes. In this system, the examined space or scene is to be surrounded by many, characteristically 8 or 9 well-calibrated cameras prior to the recording, and the reconstruction process is extremely computer intensive, the processing of roughly 10-second footage takes several hours. In addition, full automation runs into difficulties, because of the conventionally experienced so-called occlusion problems in stereo reconstruction and due to the locally missing adaptable image characteristics.
The so-called ToF (Time-of-Flight) method applied for example in so-called LIDAR (Light Detection and Ranging) devices, offers considerable advantages over conventional video flows in the field of automated scene analysis, because from the principally 2.5 dimensional range data set provided by this device, the geometrical information can be obtained directly. The 2.5 dimensional distance data imply that LIDAR only provides information about the LIDAR facing side of the examined objects. And furthermore, the experiments performed by LIDAR are much less sensitive to the climatic and lighting conditions in the course of data acquisition than the outdoor systems based on optical recordings.
High speed Rotating Multi-Beam laser scanning devices, i.e. RMB LIDAR devices like, e.g. the Velodyne HDL-64E device, are able to provide spatial point cloud and point set series. However, the scan obtained by rotating the RMB LIDAR once yields a rare point cloud, and a significant drop of sampling density can be experienced especially at larger distances from the sensor. In addition, a circular pattern is observable around the LIDAR device, where the distance between the points of circles is much smaller than the distance between the points of neighbouring circles (see Behley, J., Steinhage, V., Cremers, A.: Performance of histogram descriptors for the classification of 3D laser range data in urban environments. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 4391-4398 (2012)). These characteristics lead to an inferior visual experience, if only the raw point cloud series are displayed on a screen.
Such systems and methods are known, in which laser scanning and optically obtained recordings are applied simultaneously. These systems and methods are described in US 2012/0081544 A1, WO 2011/120141 A1, US 2010/0125812 A1, WO 01/10138 A1, US 2010/0183192 A1 and US 2010/0315505 A1. Most of these systems make a laser and optical recording of the same space, in order to obtain a spatial reconstruction of as high a quality as possible.
The examination of vegetation by a LIDAR device is described in AU 2012227155 A1. LIDAR-based systems are described in US 2013/0016896 A1, DE 10 2009 046 597 A1 and U.S. Pat. No. 8,368,876 B1. A surveillance system based on video cameras is described in US 2008/0152192 A1, US 2009/0232353 A1 and US 2010/0002074 A1.
An educational paper about a research project associated with this invention is Dániel Gerg Pintér's Új utakon a dinamikus várostervezés (New ways of dynamic city planning, Élet és Tudomány magazine, Volume LXVIII, No. 12, pp. 378-379 (2013)) which mentions that it is desirable to insert object shape models generated by a 4D studio in street view models, but the steps taken to insert such models are not presented in the paper.
In view of known solutions, there is a demand for such a method and system, by which the three-dimensional model of a scene can be prepared substantially in real time in a way that object shapes being relevant from the aspect of the scene and located in the foreground—for example stationary or moving people, cars or other important shapes or certain static objects—have a detailed texture, while the background parts being not so much important from the aspect of the scene have a more schematic texture.