With increasing development and improvement of virtual reality (VR) technologies, increasing applications for viewing a VR video such as a VR video with a 360-degree viewport are presented to users. In a VR video viewing process, a user may change a viewport (English: field of view, FOV) at any time. Each viewport is corresponding to a video bitstream of a spatial object, and when the viewport changes, a VR video image presented to the viewport of the user also should change correspondingly.
In a prior-art VR video preparation phase, a server divides a VR panoramic video into a plurality of bitstreams corresponding to a plurality of fixed spatial objects, encodes a bitstream corresponding to each spatial object, and transmits the bitstream to a VR terminal. Each fixed spatial object is corresponding to a set of dynamic adaptive streaming over hypertext transfer protocol (English: hypertext transfer protocol, HTTP) (English: dynamic adaptive streaming over HTTP, DASH) bitstreams. When a user changes a field of view, the terminal selects one or more fixed spatial objects in the video that include the spatial object based on a new spatial object obtained after the user changes the field of view, decodes bitstreams of the one or more fixed spatial objects, and presents, based on the new spatial object, video content corresponding to the spatial object. In the prior art, an amount of data transmitted between the server and the terminal is excessively large when quality needs to be ensured, and consequently cannot be supported in a network. In addition, a video with a maximum resolution imposes a strictest requirement on a decoding capability of the terminal, and consequently applicability is low. In a case in which an existing bandwidth is limited, if a relatively large compression rate is used for encoding and transmission, user viewing experience cannot be ensured. On the contrary, in the case in which an existing bandwidth is limited, if only content in a viewport of a user is transmitted, when the user changes the field of view, no content can be viewed because real-time performance cannot be implemented due to an existing network latency. Consequently, subjective viewing quality and timeliness of the user are severely affected, and applicability is low.