The first significant enhancement to ChatGPT’s image-generation capabilities in over a year was announced by OpenAI CEO Sam Altman during a livestream on Tuesday.

ChatGPT can now natively create and modify images and photographs by utilizing the company’s GPT-4o model.

The AI-powered chatbot platform has been supported by GPT-4o for an extended period of time; however, the model has been limited to the generation and editing of text, rather than images.

Altman stated that GPT-4o native image generation is now available in ChatGPT and Sora, OpenAI’s AI video-generation products, for subscribers to the company’s $200-per-month Pro plan.

According to OpenAI, the feature will be deployed in the near future to ChatGPT Plus and free users, as well as developers who utilize the company’s API service.

GPT-4o with image output “thinks” a bit longer than the image-generation model it effectively replaces, DALL-E 3, to produce what OpenAI characterizes as more accurate and detailed images.

Existing images, including those containing individuals, can be altered by GPT-4o, which can either “inpaint” details such as foreground and background objects or modify them.

In an interview with the Wall Street Journal, OpenAI disclosed that it trained GPT-4o on “publicly available data” and proprietary data from its partnerships with companies such as Shutterstock in order to enable the new image feature.

Many generative AI vendors regard training data as a competitive advantage, and as a result, they maintain strict confidentiality regarding it and any associated information.

However, training data details are also a potential source of IP-related litigation, which is another disincentive for companies to disclose much.

Brad Lightcap, OpenAI’s chief operating officer, stated in a statement to the Journal, “We are respectful of the artists’ rights in terms of how we produce the output, and we have policies in place that prevent us from generating images that directly mimic any living artists’ work.”

OpenAI provides an opt-out form that enables creators to request that their works be removed from its training datasets.

The organization also asserts that it honors requests to prohibit its web-scraping algorithms from extracting training data, such as images, from websites.

Not necessarily for the greatest reasons, the feature went viral on social media.

The image component of Gemini 2.0 Flash was found to have minimal guardrails, which enabled individuals to remove watermarks and generate images that featured copyrighted characters.

you might also like