Tech

NVIDIA creates magic3D, its text-to-image AI

NVIDIA’s relationship with artificial intelligence is more than evident, although one of its various aspects is less well known than the others. First of all, of course, we have to talk about DLSS, its intelligent upscaling technology based on Deep Learning, thanks to which the same system can provide us with better image quality without having to update the PC components. DLSS 2 available for all GeForce RTX graphics cards, and DLSS 3 as an exclusive for the RTX 40, represent a before and after in what AI can do with the graphics section of games.

On the other hand, NVIDIA graphics cards are a priority choice for the assembly of systems, workstations and servers that are going to be used for any task related to artificial intelligence. And it is artificial intelligence that requires a large floating point computing capacity, something in which graphics cards offer infinitely superior performance to processors. This has always been the case, but with the proliferation of AI, manufacturers like NVIDIA have picked up that ball and consequently optimize their most professional solutions so that their performance, in this sense, is even greater.

These two are, as you may have imagined, the best-known aspects of NVIDIA’s relationship with artificial intelligence. However, there is a third one that, without being unknown (in fact, we have already told you about it here on a few occasions), is not so well known by the general public, and it is that of development of solutions for the automatic creation of contentsmore specifically of graphic content (images and video) that is supported, of course, on its hardware.

The best known example of this is GauGAN 2, an AI built into your NVIDIA Canvas application. In its first generation, GauGAN offers us a “materials” selector (clouds, sea, earth, grass, sand, mountains, etc. Thus, we only have to fill the areas of the canvas with the type of element we want to include and, automatically, Canvas will generate a landscape based on our draft. And with its second version, in addition to continuing to have this design mode, we can also simply enter a text description of what we want. This will be processed by the AI, which will automatically , it will return an image that corresponds to our description.

Another example of solution AI designed by NVIDIA we have it in Instant NeRFa development presented at the beginning of this year and which, based on several normal photographs, is capable of generating all the volumetric information necessary to transform it into a three-dimensional scene.

And today we know of a new project in this sense, Magic3D, a quite important step by NVIDIA, since in this case it already we find a general purpose 3D text to image AIthat is, capable of generating images of any type, in the same way as some of the solutions of this type that have become very popular in the last year, but of course, with the particularity of three dimensions, compared to images 2D that we see in the most common models today.

As you can see in the video, Magic3D respond to any type of image request and, for its creation, it uses a two-stage process that takes a coarse model generated in low resolution that is later rescaled and optimized, something that we already know very well thanks to DLSS. Additionally, Magic3D also provides cue-based editing of 3D meshes. Thus, we can start from a low-resolution 3D model and a base message and modify the text to change the resulting model. And a key point here is that it is possible to preserve the same style for successive generations of images (a concept often called consistency) and apply the style of a 2D image (such as a painting) to a 3D model.

NVIDIA has not released, at least for now, the code for Magic3D, but since it states that its intention is to democratize the creation of three-dimensional images, in the same way that other AIs do with two-dimensional ones, we can count on that sooner rather than later it will begin to reach users.

Deepak Gupta

Deepak Gupta is a technical writer with a 10-year track record in business, gaming, and technology journalism. He specializes in translating complex technical data into actionable insights for a global audience.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *