VALL-E, Microsoft's AI capable of imitating voices

Artificial intelligence is the latest technology, and VALL-E is the latest (for now, of course) example of this. And also how seriously Microsoft has taken the importance of moving forward and taking positions in this market that is in a phase of absolute hatching. Is it a safe bet or are we facing a new bubble? It is still early to tell, but the outlook is more in the sense that we are facing something truly disruptive.

As I said, Microsoft has taken it quite seriously. We have already talked lately about their plans to integrate ChatGPT, first in Bing, and yesterday we learned that it also in some of the Microsoft 365 applications. In addition, today it has been known that those from Redmond are considering increasing their economic participation in OpenAI, the the company responsible for ChatGPT, something that fits perfectly with its plans to make the most of its technologies in its products and services.

Now, the fact that Microsoft is very interested in OpenAI solutions does not mean, far from it, that it trusts said company for everything related to artificial intelligence. In addition to keeping a close eye on what many other companies are doing, it is also working in this field with its engineering and R&D teams, either with solutions aimed specifically at one of its products, or as experimentation on what What can be done with AI? And this type of research is where VALL-E has come from.

Although its name reminds you of DALL-E, the OpenAI image generation AI, in this case we are faced with a somewhat different proposal, and that is that VALL-E, born from Microsoft laboratories, only needs to listen to three seconds of a voice to imitate it. And although it is not perfect, the company has published an extensive list of examples of this AI and the result is, I don’t know whether to say surprising, fascinating or chilling.

On the technical side, of course, the results obtained by VALL-E are amazing, and there are many practical uses that can be given to a solution like this, which also reminds us somewhat of the one that Apple has developed for narration. automatic audiobooks. Voice synthesis software is nothing new, and for many years those responsible have made an effort to make its sound more and more natural, which is why VALL-E, with its proposal, brings us closer to the top of that mountain. .

It is not, however, the first time that we see an AI capable of generating audio that looks like the voice of a certain person. Adobe left us speechless, back in 2016, with Adobe VoCo, an application prototype that was capable of “learning” a voice and, from that point, offering text-to-speech translation of any text message.

This AI opened an important debate about the risks that something like this poses in a world where fake news circulates at the speed of light. The possible malicious use of solutions such as VALL-E oblige those responsible to act with great caution, otherwise they could see how their well-intentioned creations become a tool used with the most evil intentions.