Computer vision is a field that includes methods for acquiring, processing, analyzing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions. One of objectives of computer vision is to duplicate the abilities of human vision by electronically perceiving and understanding an image. This can be seen as disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and machine learning theory. Computer vision has also been described as the venture of automating and integrating a wide range of processes and representations for vision perception.
Sub-domains of computer vision include scene reconstruction, event detection, video tracking, object recognition, machine learning, indexing, motion estimation, and image restoration.
Computer vision can be employed in road traffic surveillance, for example to estimate dimensions of vehicles, e.g. to allow automatic collection of fees, to identify oversize vehicles that exceed allowable dimensions defined by law or to identify vehicles that cannot enter some areas such as tunnels, passes under bridges etc.
A U.S. Pat. No. 8,675,953 discloses an electronic device that determines a geometric scale of an object using two or more images of the object. During operation, the electronic device calculates the size of the object along a direction using multiple images of the object that were taken from different perspectives (such as different locations and/or orientations in an environment) along with associated imaging-device characteristics. For example, the size of the object may be calculated using the images, the associated focal lengths of a digital camera that acquired the images, and the law of cosines. Using the scale of the object, an image of the object may be appropriately scaled so that it can be combined with another image.
A publication “Vehicle Size and Orientation Estimation Using Geometric Fitting” (Christina Carlsson, Department of Electrical Engineering, Linköpings universitet, Linköping, Sweden (ISBN 91-7219-790-0)) discloses a vehicle size and orientation estimation process based on scanning laser radar data.
Active Appearance Model (AAM) is a technique which exploits deformable model matching into an object's image. Originally it was developed for face detection but it has been proved that the technique is useful for various kinds of objects. The AAM consists of two parts: shape and appearance (texture). The shape is defined by a set of points which are grouped into multiple closed polygons, while the appearance (texture) consists of all pixels that lie inside the defined shape.
A 3DMM (3D Morphable Model) is described in details e.g. in a publication by V. Blanz and T. Vetter “Face recognition based on fitting a 3D morphable model” (Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(9):1063-1074, 2003). 3DMM is based on AAM, wherein 3DMM is described as a set of three-dimensional vertices which compose a 3D shape representing the object, and an associated appearance texture.
There are known publications disclosing methods of matching 3DMM models to an image, such as:                S. Romdhani and T. Vetter “Efficient, robust and accurate fitting of a 3D morphable model.” (In Computer Vision. Proceedings. 9th IEEE International Conference on, pages 59-66. IEEE, 2003),        J. T. Rodriguez “3D Face Modelling for 2D+3D Face Recognition” (PhD thesis, Surrey University, Guildford, U K, 2007).        
There is also disclosed a method for matching of 3DMM model to a multiple images simultaneously, in the publication R. T. A. van Rootseler, L. J. Spreeuwers, R. N. J. Veldhuis “Application of 3D Morphable Models to faces in video images” (as published in Internet at: http://doc.utwente.nl/77273/1 NanRootseler-WICSP05.pdf)
The known methods that allow to determine the metric dimensions of objects require use of calibrated cameras which allow precise dimension measurement or laser scanners. For example, an image registered by two cameras aligned in parallel at a distance of 2 m from each other and observing an object distanced by 1 m having a size of 1 m will be the same as an image registered by two cameras aligned in parallel at a distance of 20 m from each other and observing an object distanced by 10 m having a size of 10 m. This is a problem of a scale of the camera system.
It would be advantageous to present a cost efficient and resource efficient system for object dimension estimation, that at the same time would require neither precise synchronization of the cameras nor knowledge of the external parameters of the cameras (i.e. the relative positioning of the cameras with respect to each other).