If you look around the visual scene before you, you will notice that some of the most interesting visual effects you see are produced by light rays interacting with surfaces. This is because light is the only thing we see. We don't see objects—we see the light that is reflected or refracted by the objects. Most of the objects we can see reflect light (the color of an object is determined by which parts of light the object reflects and which parts it absorbs). Shiny surfaces such as metallic surfaces, glossy surfaces, ceramics, the surfaces of liquids and a variety of others (even the corneas of the human eyes) act as mirrors that spectrally reflect light. For example, a shiny metal surface will reflect light at the same angle as it hit the surface. An object can also cast shadows by preventing light from reaching other surfaces that are behind the object relative to a light source. If you look around, you will notice that the number and kinds of reflections and the number, kinds and lengths of shadows depend on many factors including the number and type of lights in the scene. A single point light such as a single faraway light bulb will produce single reflections and hard shadows. Area light sources such as windows or light panels produce different kinds of reflection highlights and softer shadows. Multiple lights will typically produce multiple reflections and more complex shadows (for example, three separated point light sources will produce three shadows which may overlap depending on the positions of the lights relative to an object).
If you move your head as you survey the scene, you will notice that the reflections change in position and shape (the shadows do the same). By changing your viewpoint, you are changing the various angles of the light rays your eyes detect. This occurs instantaneously—you move your head and the visual scene changes immediately.
The simple act of drinking a cup of tea is a complex visual experience. The various shiny surfaces of the glossy ceramic cup on the table before you reflect each light in the room, and the cup casts a shadow for each light. The moving surface of the tea in the cup is itself reflective. You can see small reflected images of the lights on the tea's surface, and even smaller reflections on the part of the tea's surface where the liquid curves up to meet the walls of the cup. The cup walls also cast shadows onto the surface of the liquid in the cup. Lifting the cup to your mouth causes these reflections and shadows to shift and shimmer as your viewpoint changes and as the surface of the liquid is agitated by movement.
We take these complexities of reflections and shadows for granted. Our brains are adept at decoding the positions, sizes and shapes of shadows and reflections and using them as visual cues. This is in part how we discern the position of objects relative to one another, how we distinguish one object from another and how we learn what objects are made of Different object surfaces reflect differently. Specular (mirror type) reflection of hard metal creates images of reflected objects, while diffuse reflection off of rough surfaces is responsible for color and lights up objects in a softer way. Shadows can be soft and diffuse or hard and distinct depending on the type of lighting, and the lengths and directions of the shadows will depend on the angle of the light rays relative to the object and our eyes.
Beginning artists typically don't try to show reflection or shadows. They tend to draw flat scenes that have no shadows and no reflections or highlights. The same was true with computer graphics of the past.
Real time computer graphics have advanced tremendously over the last 30 years. With the development in the 1980's of powerful graphics processing units (GPUs) providing 3D hardware graphics pipelines, it became possible to produce 3D graphical displays based on texture-mapped polygon primitives in real time response to user input. Such real time graphics processors were built upon a technology called scan conversion rasterization, which is a means of determining visibility from a single point or perspective. Using this approach, three-dimensional objects are modelled from surfaces constructed of geometric primitives, typically polygons such as triangles. The scan conversion process establishes and projects primitive polygon vertices onto a view plane and fills in the points inside the edges of the primitives. See e.g., Foley, Van Dam, Hughes et al, Computer Graphics: Principles and Practice (2d Ed. Addison-Wesley 1995 & 3d Ed. Addison-Wesley 2014).
Hardware has long been used to determine how each polygon surface should be shaded and texture-mapped and to rasterize the shaded, texture-mapped polygon surfaces for display. Typical three-dimensional scenes are often constructed from millions of polygons. Fast modern GPU hardware can efficiently process many millions of graphics primitives for each display frame (every 1/30th or 1/60th of a second) in real time response to user input. The resulting graphical displays have been used in a variety of real time graphical user interfaces including but not limited to augmented reality, virtual reality, video games and medical imaging. But traditionally, such interactive graphics hardware has not been able to accurately model and portray reflections and shadows.
Some have built other technologies onto this basic scan conversion rasterization approach to allow real time graphics systems to accomplish a certain amount of realism in rendering shadows and reflections. For example, texture mapping has sometimes been used to simulate reflections and shadows in a 3D scene. One way this is commonly done is to transform, project and rasterize objects from different perspectives, write the rasterized results into texture maps, and sample the texture maps to provide reflection mapping, environment mapping and shadowing. While these techniques have proven to be useful and moderately successful, they do not work well in all situations. For example, so-called “environment mapping” may often require assuming the environment is infinitely distant from the object. In addition, an environment-mapped object may typically be unable to reflect itself. See e.g., http://developer.download.nvidia.com/CgTutorial/cg_tutorial_chapter07.html. These limitations result because conventional computer graphics hardware—while sufficiently fast for excellent polygon rendering—does not perform the light visualization needed for accurate and realistic reflections and shadows. Some have likened raster/texture approximations of reflections and shadows as the visual equivalent of AM radio.
There is another graphics technology which does perform physically realistic visibility determinations for reflection and shadowing. It is called “ray tracing”. Ray tracing was developed at the end of the 1960's and was improved upon in the 1980's. See e.g., Apple, “Some Techniques for Shading Machine Renderings of Solids” (SJCC 1968) pp. 27-45; Whitted, “An Improved Illumination Model for Shaded Display” Pages 343-349 Communications of the ACM Volume 23 Issue 6 (June 1980); and Kajiya, “The Rendering Equation”, Computer Graphics (SIGGRAPH 1986 Proceedings, Vol. 20, pp. 143-150). Since then, ray tracing has been used in non-real time graphics applications such as design and film making. Anyone who has seen “Finding Dory” (2016) or other Pixar animated films has seen the result of the ray tracing approach to computer graphics—namely realistic shadows and reflections. See e.g., Hery et al, “Towards Bidirectional Path Tracing at Pixar” (2016).
Ray tracing is a primitive used in a variety of rendering algorithms including for example path tracing and Metropolis light transport. In an example algorithm, ray tracing simulates the physics of light by modeling light transport through the scene to compute all global effects (including for example reflections from shiny surfaces) using ray optics. In such uses of ray tracing, an attempt may be made to trace each of many hundreds or thousands of light rays as they travel through the three-dimensional scene from potentially multiple light sources to the viewpoint. Often, such rays are traced relative to the eye through the scene and tested against a database of all geometry in the scene. The rays can be traced forward from lights to the eye, or backwards from the eye to the lights, or they can be traced to see if paths starting from the virtual camera and starting at the eye have a clear line of sight. The testing determines either the nearest intersection (in order to determine what is visible from the eye) or traces rays from the surface of an object toward a light source to determine if there is anything intervening that would block the transmission of light to that point in space. Because the rays are similar to the rays of light in reality, they make available a number of realistic effects that are not possible using the raster based real time 3D graphics technology that has been implemented over the last thirty years. Because each illuminating ray from each light source within the scene is evaluated as it passes through each object in the scene, the resulting images can appear as if they were photographed in reality. Accordingly, these ray tracing methods have long been used in professional graphics applications such as design and film, where they have come to dominate over raster-based rendering.
The main challenge with ray tracing has generally been speed. Ray tracing requires the graphics system to compute and analyze, for each frame, each of many millions of light rays impinging on (and potentially reflected by) each surface making up the scene. In the past, this enormous amount of computation complexity was impossible to perform in real time.
One reason modern GPU 3D graphics pipelines are so fast at rendering shaded, texture-mapped surfaces is that they use coherence efficiently. In conventional scan conversion, everything is assumed to be viewed through a common window in a common image plane and projected down to a single vantage point. Each triangle or other primitive is sent through the graphics pipeline and covers some number of pixels. All related computations can be shared for all pixels rendered from that triangle. Rectangular tiles of pixels corresponding to coherent lines of sight passing through the window may thus correspond to groups of threads running in lock-step in the same streaming processor. All the pixels falling between the edges of the triangle are assumed to be the same material running the same shader and fetching adjacent groups of texels from the same textures. In ray tracing, in contrast, rays may start or end at a common point (a light source, or a virtual camera lens) but as they propagate through the scene and interact with different materials, they quickly diverge. For example, each ray performs a search to find the closest object. Some caching and sharing of results can be performed, but because each ray potentially can hit different objects, the kind of coherence that GPU's have traditionally taken advantage of in connection with texture mapped, shaded triangles is not present (e.g., a common vantage point, window and image plane are not there for ray tracing). This makes ray tracing much more computationally challenging than other graphics approaches—and therefore much more difficult to perform on an interactive basis.
Much research has been done on making the process of tracing rays more efficient and timely. See e.g., Glassner, An Introduction to Ray Tracing (Academic Press Inc., 1989). Because each ray in ray tracing is, by its nature, evaluated independently from the rest, ray tracing has been called “embarrassingly parallel.” See e.g., Akenine-Möller et al., Real Time Rendering at Section 9.8.2, page 412 (Third Ed. CRC Press 2008). As discussed above, ray tracing involves effectively testing each ray against all objects and surfaces in the scene. An optimization called “acceleration data structure” and associated processes allows the graphics system to use a “divide-and-conquer” approach across the acceleration data structure to establish what surfaces the ray hits and what surfaces the ray does not hit. Each ray traverses the acceleration data structure in an individualistic way. This means that dedicating more processors to ray tracing gives a nearly linear performance increase. With increasing parallelism of graphics processing systems, some began envisioning the possibility that ray tracing could be performed in real time. For example, work at Saarland University in the mid-2000's produced an early special purpose hardware system for interactive ray tracing that provided some degree of programmability for using geometry, vertex and lighting shaders. See Woop et al., “RPU: A Programmable Ray Processing Unit for Real Time Ray Tracing” (ACM 2005). As another example, Advanced Rendering Technology developed “RenderDrive” based on an array of AR250/350 rendering processors derived from ARM1 and enhanced with custom pipelines for ray/triangle intersection and SIMD vector and texture math but with no fixed-function traversal logic. See e.g., http://www.graphicshardware.org/previous/www_2001/presentations/Hot3D_Daniel_Hall.pdf
Then, in 2010, NVIDIA took advantage of the high degree of parallelism of NVIDIA GPUs and other highly parallel architectures to develop the OptiX™ ray tracing engine. See Parker et al., “OptiX: A General Purpose Ray Tracing Engine” (ACM Transactions on Graphics, Vol. 29, No. 4, Article 66, July 2010). In addition to improvements in API's (application programming interfaces), one of the advances provided by OptiX™ was improving the acceleration data structures used for finding an intersection between a ray and the scene geometry. Such acceleration data structures are usually spatial or object hierarchies used by the ray tracing traversal algorithm to efficiently search for primitives that potentially intersect a given ray. OptiX™ provides a number of different acceleration structure types that the application can choose from. Each acceleration structure in the node graph can be a different type, allowing combinations of high-quality static structures with dynamically updated ones.
The OptiX™ programmable ray tracing pipeline provided significant advances, but was still generally unable by itself to provide real time interactive response to user input on relatively inexpensive computing platforms for complex 3D scenes. Since then, NVIDIA has been developing hardware acceleration capabilities for ray tracing. See e.g., U.S. Pat. Nos. 9,582,607; 9,569,559; US20160070820; and US20160070767.
Given the great potential of a truly interactive real time ray tracing graphics processing system for rendering high quality images of arbitrary complexity in response for example to user input, further work is possible and desirable.
Need for Efficient Dynamic Interaction Between Components Performing Ray Tracing
In some contexts, it may be helpful to report results of an investigation in a particular fashion. Suppose for example that a boss asks an employee to investigate venues for a company summer picnic. The boss lists several criteria she's interested in: availability on certain weekends, cost, convenient location, adequate parking, whether there is a playground, whether there is a pool, whether there is a softball field, etc. The employee calls around, visits some of the sites, and now wants to provide his boss with the results of his investigation.
In what order should the employee provide the results? In the order the employee visited the sites? In the order the employee liked the venues? The employee and the boss can have a conversation and discuss the results, so perhaps the order of reporting is not so important. But now suppose the boss asks the employee to write a report summarizing his findings so a committee of middle managers can decide on the venue. Now, the order in which the employee provides the results becomes more important. Reporting in a random order or in an order known only to the employee is less efficient and could cause confusion.
Often, computers may perform operations and calculate results in orders that cannot be predicted in advance. For example, sometimes a processor needs to wait on a certain resource (e.g., memory contents) becoming available before completing a task, but may be able to make forward progress on another task while it is waiting. Sometimes, the most efficient algorithm or process for the computer to use to perform a task is not the order in which the tasks were requested but rather a completely different order that is known only to the computer. This is especially true in contexts such as computer graphics ray tracers that require a processor to traverse a potentially very large and complicated graph such as a bounding volume hierarchy to discover which parts of certain geometry the ray intersects. The order in which the processor traverses the graph may depend on a lot of variables including for example how the graph is constructed, how it is stored in memory, which parts of the graph are intersected by the ray, what range of primitives are relevant, the characteristics of the primitives found to intersect the ray, and so on.
Much work has been done in devising more efficient ways for ray tracing shaders and pipelines to communicate. However, efficient dynamic interaction between streaming multiprocessors and a traversal coprocessor is a complex matter in which further improvements are possible and desirable.