NVIDIA Instant NeRF: Artificial intelligence to turn photos into 3D scenes

NVIDIA Hopper was, without a doubt, the highlight of this year’s GTC, but we must admit that NVIDIA Instant NeRF was also a pleasant surprise. It is a rendering neural network model that, in short, is capable of transform 2D images, such as photographs, for example, into three-dimensional scenes totally realistic.

Interesting, right? Let’s stop for a moment to assess what this means, and for this nothing better than starting from what we see in the video that you will find at the end of the article. At the beginning we see four 2D images that have been captured with a Polaroid camera, one of the classic cameras in the world of photography. Those images have been used to “power” the NVIDIA Instant NeRF system.

Well, with only those four images, this system is capable of creating a totally realistic three-dimensional scene, and in a matter of seconds. The result is amazing and the best thing is that it is extremely fast, since the process of creating the scene only takes a few seconds, as we have said, and the training of the neural network is completed in a few minutes.

How exactly does NVIDIA Instant NeRF work?

We already know that a neural network is used and therefore artificial intelligence plays a key role. We have also seen that it is necessary to feed the system with 2D images taken from different perspectives, since in this way it obtains the information it needs to generate the 3D scene, but how is the process completed? NVIDIA has confirmed that Instant NeRF uses reverse rendering.

Through reverse rendering, and thanks to the support of artificial intelligence, the system can determine how light behaves in the real world, and uses it to create a 3D scene from a few 2D images. This is the key idea that NVIDIA has applied to Instant NeRF. David Luebke, vice president of graphics research at NVIDIA, has given a rather interesting comparative explanation in this regard:

“If classical 3D representations, such as polygon meshes, are similar to vector images, NeRFs they are like bitmap images: Densely capture the way light radiates from an object, or within a scene. Instant NeRF could be as important to the world of 3D as digital cameras and JPEG were to 2D photography, thanks to its ability to create 3D scenes in seconds that we can share almost instantly.”

This technology could have a huge impact on the technology sector in general, and also in the metaversebecause thanks to it we could create highly realistic avatars in a matter of seconds and with just a few photos. It would also be possible to create realistic scenes to shape virtual worlds, or reconstruct scenes to create 3D digital maps. This technology could also be used to train robots and autonomous driving systems to better understand the actual shape and size of objects.

NVIDIA has also confirmed that Instant NeRF was developed using the NVIDIA CUDA Toolkit and the Tiny CUDA Neural Networks library. It’s a lightweight neural network, so it can be trained and run on a single NVIDIA GPU, although it obviously works much better on graphics solutions equipped with tensor cores.