Path tracing is a computer graphic method for a realistic rendering of three-dimensional scenes, based on global illumination. Global illumination takes into account not only the light which comes directly from a light source, but also subsequent cases in which light rays from the same source are reflected by other surfaces in the scene, whether reflective or not (indirect illumination).
Fundamentally, global illumination integrates over all the luminance arriving to a single point on the surface of a rendered object. This luminance is then reduced by a surface reflectance function (BRDF) to determine how much of it will go towards the viewpoint camera. This integration procedure is repeated for every pixel in the output image. When combined with physically accurate models of surfaces, accurate models of real light sources, and optically-correct cameras, path tracing can produce still images that are indistinguishable from photographs.
Path tracing naturally simulates many effects that have to be specifically added to other methods (conventional ray tracing or scanline rendering), such as soft shadows, depth of field, motion blur, caustics, ambient occlusion, and indirect lighting.
Path tracing is a computationally intensive algorithm. The basic and most time consuming task in path tracing is the locating of intersection points between millions of rays and millions of polygons. In prior art it is done by massive traversals of accelerating structures and by resolving intersection tests. Traversals are typically taking 60%-70% of rendering time. In addition, the need to modify or reconstruct acceleration structures before each dynamic frame, limits the performance.
Fortunately, path tracing is quite easy to parallelize. The contribution of each ray to the final image can be computed independently of other rays. There are two main parallelization approaches in the prior art: (i) ray-parallel, in which rays are distributed among parallel processors, while each processor traces a ray all the way, and (ii) data-parallel, in which the scene is distributed among multiple processors, while a ray is handled by multiple processors in a row.
The ray-parallel implementation, subdividing the image space into a number of disjoint regions, replicates all the scene data with each processor. Each processor, renders a number of screen regions using the unaltered sequential version of the path tracing algorithm, until the whole image is completed. Load balancing is achieved dynamically by sending new tasks to processors that have just become idle. However, if a large model needs to be rendered, the local memory of each processor is not large enough to hold the entire scene. This is evident from FIG. 1 where the performance of CPU based rendering systems is compared with that of GPUs. GPU has a limited amount of video memory, therefore the effect of performance diminution occurs earlier than in CPU, which has an unlimited memory. Due to the limitation of local memory, for large models a central storage must be used, as pictured in FIG. 2, for the geometric data, acceleration structures and textures. Each processor needs a massive access to these resources. Such a centralization of resources causes a severe bottleneck. The hurdle grows with the data size, and get even worse when a central mass storage has to be used for a large data. The relatively long access times of a mass storage, levels of magnitude slower than RAM, become a stoppage for big rendering data.
Data-parallel is a different approach to rendering, best for large data cases that do not fit into a single processor's memory. Each processor owns a subset of the database, tracing rays only when they pass through its own subspace (cell). As shown in FIG. 3, the subsets of the geometry data and textures are kept in private memories, each designated a processor. The acceleration structures are broken down to small local substructures, and distributed among subsets. High locality is achieved by treating the relevant segment of a transitory ray by the local data and local acceleration structure, with a little need of central resources. Data locality is a desirable feature in path tracing: it reduces moves of massive data, contributes to a higher utilization of cache memories, reduces the use of main memory, and decreases the need of massive data moves. The high locality of the data parallel approach might be advantageous for very large models. However, the efficiency in data parallel rendering systems tends to be low, bringing up several challenges. There is a high interprocessor communication due to the massive amount of rays that must pass among the subsets of data. These passages involve a massive interfacing among the local acceleration structures. Such interfacing must be handled efficiently and well synchronized. Furthermore, the amount of communicating rays must be reduced to achieve a satisfactory efficiency.