R&D: Texture Streaming from Substance Painter

These days, real-time engines and hardware keep making incredible leaps forward. However, in Animation and VFX, image quality is critical to success, so ‘offline renderers’ (whether based on CPU or GPU) are still overwhelmingly the principal imaging tool for final quality frames.

An offline renderer, such as RenderMan, Arnold or Iray, is a tool that produces frames, but not in real-time (meaning 25 to 60 times a second for desktop applications, up to 90 times a second for VR); instead, it produces images over a longer period of time – from several seconds up to several hours. Most offline renderers are path tracers; as such they have an interactive mode where the image is first shown at a lower, grainier quality. The image slowly improves until it converges into ‘final quality’, which can be extremely realistic.

These interactive modes are showing more and more improvements. This summer, our friends at Pixar released RenderMan v22, which has a fundamentally different architecture and API. While previously only shaders, lights and cameras were interactive, it is now possible to change and update almost every element interactively.

Texturing Needs More Speed

Perhaps the one thing that is still slow to update is texturing. The most interactive solution we found at the time was the Live Link infrastructure in Substance Painter, which is still in use and has been quite successful with real-time engines such as UE4 and Unity, as well as with a constellation of indie-developed plug-ins.

The perceived speed in the existing Substance Painter to Unreal Engine “Live Link” is achieved by first exporting a low resolution version of the textures.

Unfortunately, we found that even if this is a great workflow in theory, in many cases it just isn’t fast enough due to the specifics of some renderers, texture I/O, image processing and the sheer volume of texture data that is being passed around.

So our Labs team decided to see if we could shorten the delay between carrying out a paint stroke, and this paint stroke appearing in the renderer.

In our use case, in RenderMan for Maya, we evaluated the standard process:

— Paint
— Stop the interactive render
— Hit export, wait
— Post process exported textures
— Restart the render (If the render includes many assets, this can take a long time)

Even in relatively small — for VFX — scenes this can often take minutes, which really kills the productivity (and the patience) of an artist. This is not just the case in Substance Painter — this is a problem that impacts most painting tools.

Advanced previsualization in Substance Painter’s viewport can help with better previews, which requires less frequent exports, but in the case of offline rendering the image is never really going to match for ray tracing-heavy effects such as subsurface scattering, refraction, and so on.

So let’s analyze the bottlenecks of this monster!

Normally, production-ready renderers like to ingest textures in a special, preprocessed format that includes mipmaps, and is organized in small tiles. This is because it helps scaling to renders that can load hundreds of thousands of these textures over the network, and allows reading of only the small amounts needed at the given level of detail.

When these renders are running interactively, the pointer to the texture is kept live so if any new texture tiles are required we can provide them quickly. However, keeping the texture will make it impossible or very inefficient to detect changes, and in some operating systems this will also lock the texture so that it cannot be overwritten.

Avoiding a Render Reinitialization

Tackling these textures is our first target because they require a reset of the render at every update. Such resets are very expensive because they will normally flush all temporary results, which are vital to maintaining rapid iteration.

Here’s how we accomplished this in RenderMan:

— Write a texture plugin for RenderMan. This is a type of plugin that replaces normal texture reads with custom code. It will query textures in small buckets at a given mipmap level. With this plugin, rather than loading these processed files, we can load simple images unprocessed in their entirety, which is usually what we need when we are painting a single asset. With our changes we can now ‘release’ the image as soon as it’s loaded.

— Since we aren’t using the processed images, we can skip that post-export processing step entirely or defer it to a separate sub-process, to run whenever convenient, as we no longer need it until we close the project. This is a huge improvement as this processing step can be rather expensive.

— Set up a system, in our case socket-based, to notify the renderer that we have just finished exporting our textures and that it needs to reload them and flush the pixels accumulated so far. This plugin is based on the pre-existing LiveLink to Unity and Unreal Engine.
Since most of the time was taken up by the render initialization, with these changes we reduced iteration time from about a minute to 6-7 seconds, for our example assets.

Not bad. But also not quite interactive yet.

Skipping the File System

Moving on, the next highest cost was saving and loading files.

We saw that many common file formats have some compression, which adds time to saving and loading due to encoding and decoding. Plus, going to disk and back is quite costly in general, if you remove all other bottlenecks. This can be reduced somewhat by going to RAM disks (virtual disks that actually reside in RAM), but not as much as we’d hoped. Now, assuming you are rendering and painting on the same machine, you can skip the file system entirely.

In our case the target is a CPU renderer, so we had to target something that is ready to be consumed by it: the main RAM.

Here were our next steps:

— Dump the working GL textures as uncompressed buffers straight to a section of the RAM. This is called ‘shared memory’ and it is a system that lets you create named buffers to share across processes (as opposed to threads, which already share memory by default).

— Modify the RenderMan texture plugin to first look in shared memory and, if that fails, to revert to the file system. That way, textures will still exist if you close Painter or open a new project.

— Now that the texture dump is much faster, we can automate this process after each stroke or at regular intervals (no more export button!).

Carefully tweaking this process and the notification system, we were able to finally reduce this process to sub-second intervals on high-res textures, which is quite impressive in these kinds of workflows.

Streaming to RenderMan. In this example textures are saved every half second.

Our Labs team partnered with Pixar and we worked quite closely on this. As a result, we had the privilege of demonstrating this prototype at the RenderMan Science Fair at SIGGRAPH in Vancouver this summer. As a Labs project, it isn’t quite an alpha of a real product, but rather a proof of concept. The response from the film community however was very encouraging, so we hope to deploy the core of this tech with Painter soon, and to open source the example integrations so studios can adopt this system in their own time.

Future Improvements

The first improvement we want to make to this technique is generalization.

François Beaune, who was part of the team who worked on this prototype, is also the owner of the open source path tracer appleseed. He implemented shared texture support in appleseed (currently in a private branch) and was able to stream textures without any further changes in the Painter prototype, demonstrating that the approach is flexible and renderer-agnostic.

Texture Streaming to appleseed, in this example textures are saved with every paint stroke.

If we could export bits and pieces of textures rather than the whole thing, this could scale to even bigger textures and many UV Tiles and Texture Sets. This is because renderers like to access their textures in tiles but also because, generally, at each brushstroke only a portion of the texels needs updating. It will be interesting to see how this fits with the new Sparse Virtual Textures (SVT) in Substance Painter.

This would depend on its architecture and memory layout, but if the target renderer were a GPU engine, exchanging or even sharing the same portions of VRAM more directly would make this even faster. While this sounds promising, we haven’t yet explored this direction, partly because we’re interested in network architectures for GPU renderers.

Finally, can this technique be extended to other Substance products, such as Substance Designer or Substance Alchemist? You bet! The use cases and the types of integrations (as well as the speed gains) would be a little different to asset painting, but streaming texture data via shared memory buffers is a pretty general approach.