Internet services using an interactive navigation via panoramic images have met with great success over the past few years. With these services, the user can notably have access and recover visual information from celebrity sites, virtually visit town or museums, move about virtually inside buildings, etc. These services are based on a technique know as IBR (Image Based Rendering) that, instead of using complex 3D modelling tools to construct graphic models, uses as texture imaging from the real world and applies these to basic 3D objects, such as cylinders, tubes or spheres, in which the user can navigate. The main disadvantage of the IBR technique is that it only enables the reproduction of static scenes.
Also, a few years ago, a technique known as VBR (Video Based Rendering) appeared in which the textures applied to the 3D object are video streams. According to this technique, several synchronised video streams are generated for example via a panoramic camera to capture the entire dynamic scene using several viewpoints. These streams are then processed and applied to the 3D object. The user can then navigate in this dynamic scene with the sensation of being immersed in it and of being at the heart of the action.
In the context of interactive navigation via panoramic images, a document is known entitled “Efficient Representation and interactive streaming of high-resolution panoramic views” by Carten Grünheit, Aljoscha Smolic and Thomas Wiegand, ICIP 2002. This document describes a procedure in which video streams of the panoramic scene are transmitted from a server to a user terminal according to navigation commands coming from the user. The video stream of the panoramic scene is a sequence of temporally successive panoramic images. In order to facilitate the transmission of this panoramic video sequence, it is divided into a plurality of non-overlapping video portions of pre-defined size forming a mosaic. These video portions are called video patches in the document cited previously and in the remainder of the present description. Each video patch corresponds to a spatial zone of the panoramic video sequence and presents a specific spatial position in the panoramic view. The video patch can comprise a fixed or variable number of successive video patches. This division of the panoramic video sequence into video patches is shown in FIG. 1. A panoramic video sequence 10 is divided into a plurality of video patches 11. Each video patch is coded separately in the server. The part of the video sequence referenced 12 represents the part of the sequence that is displayed on the screen of the user terminal. This zone is defined by the navigation commands transmitted by the user terminal and is called the visibility zone hereafter in the description.
FIG. 2 shows an example of the visibility zone 12 overlapping several video patches, referenced 11a to 11d. In this case so that the user terminal can display part 12, the server must transmit to it at least these 4 video patches.
So that the user has the impression of being immersed in the scene, it is necessary that the display of the video patches on the screen of the user terminal is rapid and fluid. For this purpose, a pre-fetching process of video patches is defined in the previously cited document. This process consists in triggering at the server the transmission of video patches even before they are contained in the visibility zone 12. The client requests of the server to transmit the video patches corresponding to a certain zone of the panoramic view and later than a given presentation timestamp, this presentation timestamp being estimated according to the current timestamp and the maximum time necessary to receive the patches. It also involves loading in the user terminal not only the video patches belonging to the current visibility zone but also those that are likely to be if the visibility zone moves in any direction following a user navigation command.
For this purpose, a zone 13, called a detection zone, is defined for each video patch 11, said zone 13 surrounding the image portion of the video patch 11 as shown for example in FIG. 3. In this example, the frontier of the image portion of the video patch and the frontier of its detection zone are separated by a distance d over the entire perimeter of the image portion. Once the visibility zone 12 enters the detection zone 13, a demand is transmitted to the server requesting it to transmit the video patch 11 to the user terminal.
In the example of FIG. 4, the part 12 overlaps the detection zones, referenced 13a to 13f, of six video patches 11a to 11f. The video patches 11a to 11f are thus pre-fetched into the user terminal on request by it while only the video patches 11a to 11d are comprised, in part or entirely, in the current visibility zone 12.
The pre-fetching of video patches in the user terminal enables overcoming, at least in part, the latency time between the transmission of the request and the reception of video patches by the user terminal.
The size of the detection zone 13 is made sufficiently large so that the request is transmitted sufficiently early to the server in order to obtain good navigation speed without however being too big to pre-fetch in the user terminal a too high number of video patches (knowing that some among them will never be displayed on the screen afterwards) and thus penalise the fluidity of the navigation. In fact, the size of the detection zone must be defined in order to obtain a good compromise between the fluidity and speed of the navigation. The navigation fluidity depends on the loading capacity of the user terminal processor, which diminishes with the size of the detection zone (more data to be downloaded. Conversely the navigation speed is proportional to the size of the detection zone (more time to load the data). This compromise is not however always easy to determine.
Moreover, the fluidity and the speed of navigation can be affected by dimensions of the direct channel of the network transporting the video patches and/or the return channel transporting the requests.
One aim of the present invention is to propose a method for interactive navigation in a panoramic video sequence enabling at least to partially overcome the disadvantages previously cited.