Technology advances at giant steps. And what doesn’t stop either is Chinese censorship. The technology company Baidu has just presented ERNE-ViLG. It is a new text-to-image AI capable of generating real images on different cultural aspects of the Asian giant. It is even capable of creating anime, improving other leading AIs such as DALL-E 2.
However, what does not change is censorship in china. And it is that, for example, the second largest square in the country and with a great historical, cultural and political load, Tiananmen Square, does not appear in this AI tool. A detail that did not go unnoticed by users when a demo of this software was launched at the end of August.
Quickly, they noticed that certain political leaders or words considered controversial were labeled as “sensitive”. So they were blocked in the results. It is far from the first time that technology and censorship go hand in hand in China. Its ‘competitor’ DALL-E 2 prohibits sexual content and images of medical content or faces of public figures.
How ERNIE-ViLG works
The ERNIE-ViLG model is part of Wenxin, a large-scale project in natural language processing from Baidu. It is made up of a data series of 145 million image-text pairs and contains 10,000 million parameters that the AI uses to discern the subtle differences between concepts and art styles.
And what does this mean in practice? That ERNIE-ViLG has a smaller data set than DALL-E 2, which stays at 650 million pairs, and Stable Diffusion with 2.3 billion pairs. But, however, more parameters than either of the two. At the moment, Baidu has released a demo version on its own platform and also on Hugging Face, one of the most important international AI communities.
The main difference between ERNIE-ViLG and other western software is that the one developed by Baidu understand written directions in Chinese. Also, you are less likely to make mistakes when dealing with such culturally specific words.
For example, a Chinese video creator compared the results of different models that included Chinese historical figures, pop culture celebrities, and food. And he found that ERNIE-ViLG produced more accurate images than DALL-E 2 or Stable Diffusion. After its release, ERNIE-ViLG has also been adopted by the Japanese anime community.
Censorship as a ´guarantor´ of national security and stability
In a test conducted by the MIT Technology Review, it was found how several Chinese words had been blocked as names of high-profile Chinese political leaders like Xi Jinping and Mao Zedong. Also terms that can be considered politically sensitive, such as ‘revolution’, or the name of the founder and CEO of Baidu, Yanhong (Robin) Li.
While words like ‘democracy’ and ‘government’ are allowed, prompts combining them with other words, like ‘democracy in the Middle East’ or ‘British government’, are blocked.
It’s not something new, far from it. In fact, in China, social media companies often have lists of sensitive words. that have even been elaborated by instructions of the government.
In January of this year, the Chinese government proposed a new regulation that prohibits any content generated by AI which, according to him, endangers national security and social stability. Despite this, the potential of ERNIE-ViLG should not be underestimated, as it will continue to play an important role in the development of large-scale text-to-image AI.