Article by Kévin Dietrich Digital Trudy created by Reality Labs Research rendered using Cluster-Blender, image courtesy of Feng Xie @ Reality Labs Research. Facebook (now Meta) joined the Blender development fund during 2020 with the main purpose of supporting the development of Cycles. A team from Facebook Reality Labs led by Feng Xie and Christophe […]
Article by Kévin Dietrich
Facebook (now Meta) joined the Blender development fund during 2020 with the main purpose of supporting the development of Cycles. A team from Facebook Reality Labs led by Feng Xie and Christophe Hery are interested in using Cycles as a renderer for some of their projects. They chose Cycles because it is a full featured production renderer; however, they are also interested in improving Cycles’s real time rendering capabilities and features for high quality digital human rendering.
There are three main areas of collaboration:
A short paper about the cloud based real time path tracing renderer Feng’s team built on top of Blender-Cycles will be presented at Siggraph Asia 2021.
This article will discuss the work done so far.
In order to support as many platforms as possible, Cycles stores the data of the whole scene in unified, specific buffers. For example, all UVs of all objects are stored together in a single contiguous buffer in memory, similarly for triangles or shader graphs. A major problem with this approach is that if anything changes, the buffers are then destroyed before being recreated, which involves copying all the data back, first into the buffers, then secondly to the devices. These copies and data transfers can be extremely time consuming.
The solution, proposed by Feng, was to preserve the buffers, as long as the number of objects or their topologies remain unchanged, but to copy into the buffers only the data of the objects that have changed. For example, if only the positions of a mesh have changed, only its positions are copied to the global buffer, while no other data is copied. Previously, all data (UVs, attributes, etc.) had to be copied. In the same way, if only a single mesh changes in the scene, only it will have its data copied: no other copy of any other data of any other object will be made.
Change detection is done through a new API for the nodes of which the Cycles scene is composed (not to be confused with the shading nodes). While in the past client applications could have direct access to the nodes’ sockets, these are now encapsulated and have specific flags to indicate whether their value comes from a modification. These flags are set when setting a new value in the socket if it differs from the previous one. So, for each socket, we can now automatically detect what has changed when synchronizing data with Blender or any other application. Having this information per socket is crucial to avoid doing unnecessary work, and this with a greater granularity than Cycles could do until now.
Another optimization was the addition of BVH refit for OptiX: BVHs of objects are kept in memory and only reconstructed if the topology of the object changes. Otherwise, the BVH is simply modified to take into account the new vertex positions.
All in all, these few changes reduced the memory consumption of Cycles, and made it possible to divide the synchronization time by a factor of 2 on average. Benchmark results using FRL’s Trudy 1.0 character are shared below.
Another new feature was the addition of a procedural system. Procedurals are nodes that can generate other nodes in the scene just before rendering. Such a mechanism is common in other production-ready render engines.
Alembic was chosen for the concrete implementation of the first Cycles procedural. Alembic is widely used by 3D content creation systems to exchange baked geometry animations (including skin and hair deformations). In the future, we may explore developing a native USD procedural.
The Alembic procedural allows the loading of Alembic archives natively, avoiding to go through Blender and its RNA, which is slow to traverse and convert. The procedural is accessible via Blender, and when active, Blender will not load data from the Alembic archive, saving precious time. The procedural also has its own memory cache to preload data from disk to limit costly I/O operations. With this procedural we can load data much faster than from Blender, up to an order of magnitude faster.
Let’s take a look at a case study using Reality Labs’ digital character Trudy. Digital character Trudy consists of 54 meshes with 340K triangles. Rendering Trudy in Blender 3.0 using Cycles’ native alembic Procedural increases the framerate by 12x, from 2 to 24 frames per second. A detailed breakdown of the acceleration for each computation stage is shown in the table below. (Courtesy from Feng Xie, Reality Labs Research)
Statistics: 54 objects, 340K tris, 770MB textures
The team at Facebook is also interested in improving the physical correctness of Cycles BSDFs. For example, a recent contribution saw the rendering of subsurface scattering improved via the addition of an anisotropic phase function to better simulate the interaction between light and skin (tip: human skin has an anisotropy of about 0.8). This contribution was made by Christophe Hery, the co-author of the original Pixar publication introducing this improvement to the rendering community.
Although this collaboration was previously focused on synchronization speed, it is now focused on physical correctness and path tracing speed. A patch for manifold next event estimation, which will improve the rendering of caustics, will be soon submitted for review and inclusion. There are also several optimizations still possible in the work to improve the performance of Cycles’ data transfer and memory usage (especially VRAM).