Three AI Models, One Story: What "The Little Engine That Could" Reveals About AI Capabilities
A morning coffee experiment that uncovered surprising insights about how different AI models approach creative challenges. I put all three videos at the bottom.
The Setup
Picture this: It's early morning, coffee in hand, and I'm thinking about how to explain the AI adoption journey to clients. The progression from basic chat interactions to sophisticated agentic workflows can feel overwhelming—especially for organizations just starting their AI transformation.
Then it hit me: What if we told this story through the lens of "The Little Engine That Could"? Thomas the Tank Engine chugging along the rails of AI evolution, from those first tentative "I think I can" moments with basic prompts to confidently declaring "I know I can" as we master AI workflows.
But here's where the experiment began. Instead of crafting this narrative myself, I decided to test how three leading AI models—Gemini, Claude, and ChatGPT—would interpret and develop this same creative prompt. Same input, three different AI "engines," no editing allowed.
The Methodology
I provided each model with identical instructions: develop a video script concept using "The Little Engine That Could" framework to illustrate the AI learning journey—from basic chat through prompting, projects, custom GPTs/Gems, agents, and finally AI workflows.
The constraints were simple:
No editing of responses
Direct output to video creation tool (VEO3)
Identical source prompt across all three models
What emerged wasn't just three different scripts—it was a fascinating glimpse into how different AI architectures approach creative problem-solving.
What the Results Revealed
While all three models successfully interpreted the core concept, their execution styles diverged in telling ways:
Narrative Structure: Each model organized the progression differently. Some emphasized the emotional journey of learning, others focused on technical milestones, and one approached it more like a capabilities roadmap.
Tone and Voice: The personality differences were striking. Where one model leaned into the whimsical children's story elements, another maintained a more professional, business-focused tone throughout.
Detail Depth: The level of granularity varied significantly. Some models provided rich, scene-by-scene descriptions perfect for video production, while others offered high-level concepts requiring more interpretation.
The Winner (This Time)
In this particular instance, ChatGPT delivered the most polished, video-ready output. Its script struck the right balance between storytelling charm and practical application, with clear scene transitions and engaging narrative flow that translated beautifully to video format.
But here's the crucial insight: "winning" was contextual. ChatGPT excelled at this specific creative-narrative task, but that doesn't make it universally superior. Each model demonstrated distinct strengths that would shine in different scenarios.
Strategic Implications for AI Adoption
This experiment reinforced several key principles I share with clients navigating AI tool selection:
No Universal Best Model: Just as we wouldn't use the same software for every business function, different AI models excel in different domains. The key is matching tool capabilities to specific use cases.
Prompt Consistency Matters: Using identical prompts across models provides genuine comparative insights. Too often, we judge AI tools based on inconsistent inputs, leading to skewed perceptions.
Creative Tasks Reveal Character: While technical benchmarks tell us about raw capabilities, creative challenges like this reveal how models "think" and approach problems—critical insights for workflow integration.
Looking Forward
As organizations mature in their AI adoption journey—moving from basic chat to sophisticated agentic workflows, understanding these model differences becomes increasingly important. What works for ideation might not work for analysis. What excels at creative tasks might struggle with structured outputs.
The little engines of AI are all chugging along the same tracks, but they're carrying different cargo and optimized for different destinations. The art lies in knowing which engine to choose for which journey.
What's your experience been with different AI models? Have you noticed similar patterns in how they approach creative versus analytical tasks? I'd love to hear about your own AI "engine" experiments in the comments.
The videos from this experiment demonstrate these differences in action—sometimes the subtle variations in interpretation and execution matter more than we initially realize. As we continue building AI-enhanced workflows, these nuances become the difference between good and great outcomes.
ChatGPT
Claude
Gemini