Google first make a public release of its text-to-image AI model Imagen

Google Imagen produces images from texts much like DALL-E-2. However, with the announcement of adding a very limited form of Imagen to its AI Test Kitchen App, Google has kept the AI model far from the public’s hand.

What is “Imagen”?

Imagen is a diffusion model that converts text to images. Users can simply input text, and the AI system will use a frozen text encoder to turn the texts into embeddings which will then be mapped into images. Imagen is developed by Google Brain, a deep-learning artificial intelligence research team. With a deep understanding of languages, Imagen can produce a super high-resolution of photorealistic images.

Interactions with Imagen

City Dreamer

Inside City Dreamer, users can build an AI-generated city with whatever theme they like. For example, if they enter “Christmas”, Imagen will then create a city with sample buildings and infrastructures with elements related to Christmas. It is just like what you will see in SimCity.

Wobble

A monster is created according to user’s selection of the materials it is made of, such as clay, felt, marzipan and rubber. The monster will also have an outfit of the users’ choice. Imagen generates the monster, and you can name it and interact with it.

Limitations of Imagen

The AI Test Kitchen was launched earlier this year to let Google beta test some of its AI technologies. Added to the AI Test Kitchen App, Imagen was set with some constraints. Compared to DALL-E’s public beta, Imagen did not serve to have much freedom as users cannot request everything they like. Josh Woodward, senior director of product management at Google, explained that it is to collect early feedback on the technology and find out how people will break it. Therefore, they exercise control over how the user interacts with the system.

Challenges of Imagen

1. Misuse of the technology

When Imagen becomes a commercial product with full public access in the future, some of the users may abuse it if total freedom and control are given to users.

2. Reliance on databases

With the surging demand for data for training text-to-image AI models, datasets scraped from the web, biased and unsorted, are largely used by tech companies to train AI models. These data may negatively impact how the AI model presents the images.

Whether Imagen will be developed into an AI model for public use is still under discussion. Users can take the lead to try it with City Dreamer and Wobble with AI Test Kitchen. It is free for download on Android and iOS.

References: The Verge, DPReview

AI Google Google AI Imagen Google Imagen Imagen 人工智能