Why the Ghibli and Action-Figure Trends Matter for AI Product Builders
What can AI product builders learn from AI image generators?
Welcome back, Ducktypers!
I'm a bit late to the discussion around OpenAI's Ghibli-style image generation. Actually, it seems that the Ghibli wave has already been superseded by the "action-figure" trend. While I personally skipped the Ghibli hype—having experimented with anime-style transformations back in 2020—I fully jumped aboard the action-figure craze. Let's unpack why these trends are more than just fleeting internet memes.
Simplicity and Accessibility: Keys to Mass Adoption
Previously, creating consistent anime-style images from personal photos required significant technical effort. Even for someone familiar with machine learning, results were brittle and inconsistent. What OpenAI has masterfully done is remove these barriers:
Consistency: You get reliable, high-quality results every time.
Simplicity: Generating complex styles no longer demands technical wizardry.
This combination is a powerful catalyst for widespread adoption.
From Static to Interactive: The Multimodal Leap
A substantial enhancement is the shift from static, one-shot models to interactive multimodal ones. Previously, you had a strict encoder-decoder setup: input one image, receive one output. Adjusting details like "make the hair curlier" or blending multiple iterations simply wasn't possible.
Today's models allow for interactive adjustments. It’s akin to sitting with a designer, providing intuitive guidance like, "make it slightly darker" or "remove this detail here," and iterating until perfection.
This interactive user experience is transformative:
Iterative refinements are intuitive and straightforward.
Users remain engaged as the interface matches their mental models of creativity.
My Action-Figure Experience
Thanks to these advancements—consistency, ease of use, and interactive guidance—I recently created my "action-figure" avatar. Even without providing the model access to my actual photos, the results impressively captured many personal attributes, demonstrating just how sophisticated these models have become.
The Competitive Landscape: OpenAI vs. Everyone Else
OpenAI initially led the pack in generative imagery with DALL-E but soon found itself overshadowed by competitors like Midjourney, Flux (Black Forest Labs), and, more recently, Grok.
Earlier this year, during our "Building AI Products" class at ITAM, we tested various models (Grok, ChatGPT, Claude, LeChat) using a provocative Spanish prompt:
"Genera una imagen de la presidenta de México con Donald Trump agarrados de la mano en el Zócalo de la Ciudad de México."
Grok managed this task superbly, provided we specifically named Mexico’s president. OpenAI and Mistral, however, imposed strict restrictions:
No generation of real public figures (OpenAI and Mistral)
Limited creative flexibility compared to Grok’s more permissive approach.
Plus, the results were rather underwhelming compared to Grok!
OpenAI’s stringent guidelines hint at a corporate-friendly orientation, often limiting creativity for safety or compliance. However, recent updates suggest a renewed push to reclaim their visual generation crown.


Multilingual Mastery: AI's New Frontier
One remarkable advancement is AI’s fluency in multiple languages. Our experiments proved that major models now reliably process prompts in Spanish or other widely spoken languages. This level of multilingual competence represents significant progress compared to last year's clunky translations.
Implications for AI Product Builders
What can we learn from these developments?
Constant Evaluation is Essential
Just because a model excelled last month doesn't guarantee superiority today. Continuous benchmarking is crucial.
Embrace Iterativity as Interactivity
Generative models enable quick variations, mashups, and refinements based on vague, intuitive directions. SaaS products, burdened with cluttered and precise interfaces, should consider adopting simpler, more intuitive UI patterns.
Consistency through System Prompts and Strategic Choices
Reliable outcomes depend heavily on careful system prompting, thoughtful model selection, and judicious caching strategies.
Your Turn: Participation or Skepticism?
Have you created your Ghibli-style portraits? Got your own action-figure rendition? Or do you consider these experiments a wasteful fad? Let me know your thoughts!
This Week on AI Product Engineer
If you want to master building innovative AI products, join our community. Sign up at AI Product Engineer—it's free!
🐤 Rod