Generate AI Images Directly in ChatGPT with GPT-4o

OpenAI’s Mind-Blowing New AI Image Generator

OpenAI made available “Images in ChatGPT,” which allows users to generate images directly within the ChatGPT interface through a revolutionary tool. The GPT-4o model powers this new feature, which allows users to generate images through conversation and represents a major advancement in AI content creation.

All ChatGPT subscription options now support the new functionality, which includes Plus, Pro, Team, and the free tier to enable wider access to advanced image creation tools. Taya Christianson from OpenAI informed that free tier users who must adhere to image generation limits similar to DALL-E 3 may experience adjustments to these restrictions depending on demand. A specialized custom GPT will remain available for users who appreciate DALL-E.

OpenAI research lead Gabriel Goh explained that GPT-4o stands out as an “omnimodal” model, which has the ability to process multiple data types such as text, images, audio, and video. The updated model now presents improved “binding” functionality, which resolves an ongoing problem in AI image creation. GPT-4o effectively maintains clarity when handling 15 to 20 objects while avoiding mistakes in color or shape identification, unlike earlier models, which struggled with these object-attribute relationships.

The system demonstrates major progress through its improved text rendering capabilities. AI-generated images usually display text that appears distorted or illogical. The development process was characterized by repeated iterations that stretched on for months before reaching satisfactory completion, according to Goh. The team has reached a standard of text consistency in images despite the ongoing challenge of perfect small text rendering.

The system’s design diverges from the typical diffusion models used by image generators by employing an autoregressive method. The sequential image generation approach from left to right and top to bottom resembles text generation methods, which may enhance text rendering and binding abilities.

In their briefing, OpenAI demonstrated the system’s extensive capabilities, which encompassed the generation of scientific diagrams such as Newton’s prism experiment with precise labels as well as multi-panel comics with coherent characters and dialogue alongside informational posters with exact text content. Demonstrations covered practical uses such as creating transparent background images for stickers, restaurant menus, and logos.

ChatGPT’s multimodal product lead, Jackie Shannon, highlighted how the system utilizes comprehensive world knowledge. She creates images through her own skill boundaries, yet applies her extensive world knowledge. By incorporating world knowledge into its framework, the model allows users to request an image of Newton’s prism experiment without needing any additional explanations to receive it.

OpenAI says enhanced quality and capabilities make up for the extra time needed in image generation. Shannon acknowledged that latency needs improvement, yet affirmed that the superior quality of images and their enriched capabilities provide ample compensation for any additional waiting time.

Key Features and Safeguards Implemented by OpenAI:

Enhanced Binding: GPT-4o manages accurate relationships between 15 and 20 objects while effectively minimizing confusion between their colors and shapes.
Improved Text Rendering: Through careful development processes, OpenAI achieves reliable text representation in generated images, which solves a frequent challenge in AI image production.
Autoregressive Approach: The system utilizes a sequential generation method for images, which may lead to improved management of text and objects.
Robust Safeguards: OpenAI maintains security measures that stop watermark removal while blocking sexual deepfakes and rejecting CSAM requests.
C2PA Metadata: Every created image comes with standard C2PA metadata to identify it as an OpenAI product.
User Ownership: Users maintain ownership rights over their generated images per OpenAI’s usage policy guidelines.

OpenAI highlighted its strong protective measures to combat possible misuse. No system achieves perfection in this area, but we maintain constant improvement of our safeguards, which we regard as our initial step, Shannon explained. All images created through ChatGPT belong to users who can use them according to OpenAI’s usage policies, as long as they choose.

OpenAI is advancing its flagship product through “Images in ChatGPT” while establishing an accessible standard for powerful AI-generated images and proactively managing technology-related risks.