Tech

Midjourney turns the tables, from image to prompt

Image-generating artificial intelligence models such as Midjourney, DALL-E 2, and Stable Diffusion have become enormously popular in recent times. Although its splendor has been eclipsed in recent months by its “brothers” the chatbots, the generation of images has already become a daily tool for many professionals, and the improvements in the models (DALL-E 2 seems to has finally learned to draw human hands) make these AI offer increasingly reliable generations and, therefore, closer to the objective set by each prompt.

There is, however, a problem at this point, and that is that creating a prompt is not as easy as it might seem at first. These models are trained to understand natural language, but even the best development in this sense has limitations. This has given rise to a proliferation of web pages where we can find prompts created by third parties, but also commercial services where we can describe the image we want, and which will be transformed into the most appropriate prompt by an expert in the field. Yes, you read correctly, an expert in “talking” with AI models, in what has been called prompts engineering (the figure seems very useful to me, but raising it to the rank of engineering seems somewhat pretentious to me, the TRUE).

As I said, services such as Midjourney add up to a phrase, very common in game descriptions and that you have surely read on more than one occasion: Easy to play, hard to mastersomething of which we can be quite clear that those responsible are very, very aware, and try to improve constantly… to the misfortune of those who have found a space for professional development in it.

The latest news in this regard, which has been discovered by the AI ​​expert LinusEkenstam and that represents a huge leap in this sense, is that now we can upload an image to Midjourney and the model will generate several prompts that would allow us to create said image. The potential of this new feature is enormous, since it offers us the best tool that has existed, so far, to learn how to generate prompts that fit the logic used by Midjourney to create the images.

On the other hand, this opens the doors to something I’ve been thinking about for some time and that may be even more revolutionary. You will surely remember that one of the main novelties of GPT-4 is that it is multimodal, that is, that it admits the combination of text and images in the input prompts. Now that Midjourney has already shown that it supports images as an input mode (although for now it’s for generating text output), the potential of combining images and text in the postto obtain an output either text or image is spectacular.

Not everything is perfect, yes. Input prompts with images can pose a huge threat to copyrighted content, something that already brings many people upside down today. It may be possible to train models like Midjourney to recognize very popular protected content (from Mickey Mouse to the Coca-Cola logo) in image input prompts, but this is more complicated when it comes to less recognizable content.

Related Articles