The devising of a 3D database for modelling an urban zone generally involves a phase of manual or automatic extraction of the framework on the basis of one or more aerial or space images of the zone to be modelled. The images used for this modelling can emanate from diverse observation sensors: a conventional photographic apparatus, an image sensor onboard an aircraft or an observation satellite. Furthermore, the picture-capture spectral domain can vary: domain of visible light, infrared, multispectral or hyperspectral, radar, lidar, for example. Moreover, the images are geographically referenced, stated otherwise georeferenced, that is to say they are associated with metadata comprising a correspondence function matching the observed terrestrial surface, the space designated by the expression “terrain space”, and the pixels of the image, the space designated by the expression “image space”. This function, denoted fθ1, . . . , θn(X,Y,Z), is a portrayal of the physical characteristics of the sensor used for picture-capture; it thus matches the geographical coordinates of any point of the observed zone with a corresponding pixel of the image. Stated otherwise, it is a parametric model whose parameters θ1, . . . , θn comprise at least the physical characteristics of the sensor (size of receptor matrices, focal length if relevant, etc.) and the position and the orientation of the sensor at the moment of picture-capture: this function is then called a physical model of the picture-capture. These types of model are for example:                the conical model representing the conventional picture-capture of a photographic apparatus with focal plane: it corresponds to a matrix of receptors (CCD for Charge-Coupled Device, or CMOS for Complementary metal oxide semi-conductor);        the “pushbroom” model where the receptors are organized along a unidimensional strip;        the “Whiskbroom” model where the receptor is reduced to a cell whose fast motion makes it possible to form an image;        the “SAR” (Synthetic Aperture Radar) model emanating from a mathematical post-processing by the SAR processor of the electromagnetic reflections of an incident radar signal in a given frequency band.        
In other cases, this function may amount to a very general mathematical function which, most of the time, has properties of a universal approximator whose parameters do not have any particular physical meaning. One then speaks of a replacement model. Prevalent examples of these types of model are:                Polynomial models,        RPC models which are quotients of polynomials (Rational Polynom Coefficients)        “Grid” models where the function is piecewise linear by interpolation of the values at the nodes of a grid.        
It is possible to separate the schemes for extracting buildings into two major categories: the stereoscopy technique, in which several georeferenced images emanating from different viewpoints are used jointly to extract the relief, and the monoscopy technique, which relies, on the one hand, on a single image assigned an angle of parallax making it possible to reconstruct the height of the objects represented, and on the other hand on the altitude datum for the object considered, which is generally provided by a model M of the terrestrial surface which contains, for any point P of the terrain, the altimetric coordinate Z as a function of the planimetric coordinates X and Y. It is expressed in the following manner: Z=M(X,Y),
where Z is the altimetric coordinate of a point P of the terrain; and X and Y are the planimetric coordinates of this point P. The model M of the terrestrial surface is for example a digital surface model (DSM) or a digital elevation model (DEM), these two models giving relief information relating to what is above ground. As a variant, this entails a digital terrain model (DTM) which gives relief information relating to the bare ground.
Extraction in monoscopy may be carried out manually, automatically or according to a combination of manual actions and of automated methods. When extraction is manual, display means, for example a screen, are used to allow an operator to input the buildings directly on these display means with the aid of a mouse or similar means. As a general rule, this input makes it possible to obtain models of good quality on optical images, but it turns out to be irksome and expensive. Moreover, certain types of images such as images of SAR type (the acronym standing for “Synthetic Aperture Radar”) are very difficult for a human operator to interpret, which complicates the input of buildings, requires operators trained specifically for this type of image, and consequently increases the modelling costs.
Certain automatic techniques have been proposed in order to accelerate extraction. It is possible to cite notably the scheme described in the American patent published under the number U.S. Pat. No. 7,733,342. This scheme utilizes the shadows cast by buildings to deduce their height therefrom. However, this scheme comprises several limits. On the one hand, it makes it possible to determine only the height of a building, and does not therefore solve the problem of determining its footprint. Moreover, this scheme is contingent on the picture-capture conditions and notably on the presence of sufficient illumination to create shadows. Moreover, it operates only with optical images, and does not therefore allow extraction on the basis of images of SAR type.
Other techniques are based on the prior extraction of elementary primitives such as segments. Once a set of segments has been constructed, algorithms for reconstructing the objects are executed so as to associate the segments with one another and thus form the buildings. However, these techniques are prone to numerous errors, notably because of the imperfections in the detections of contours. They require complex parametrization involving numerous thresholds to be fixed prior to the execution of the algorithm. Furthermore, a suite of parameters culminating in results that are satisfactory for a given image may turn out to be totally unsuited to other images; this problem is further heightened for images of SAR type. Consequently, these algorithms based on segmentation lack robustness.
A scheme disclosed by Dominik Brunner et al., entitled “Earthquake Damage Assessment of Buildings Using VHR Optical and SAR Imagery”, and published in IEEE Geoscience and Remote Sensing Society, vol. 48, no. 5, May 2010, proposes to extract buildings from an SAR image. It makes it possible to determine the height of the building by utilizing both the SAR image and an already known building model. This scheme does not make it possible to automatically determine all the parameters characterizing a building in the image, but only a single parameter (the height) on the basis of a first model already extracted beforehand.