Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 ↗
Simon Willison tested two newly released models—Alibaba's Qwen3.6-35B-A3B and Anthropic's Claude Opus 4.7—using his informal "pelican riding a bicycle" benchmark for image generation. Qwen3.6-35B-A3B, a 20.9GB quantized model running locally on a MacBook Pro, produced superior SVG illustrations compared to Claude Opus 4.7 in both pelican and flamingo test cases. Willison notes the result is surprising given the proprietary model's expected capabilities, though he emphasizes the benchmark is primarily a humorous commentary on the absurdity of model comparison.
I'm giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!
The pelican benchmark has always been meant as a joke—it's mainly a statement on how obtuse and absurd the task of comparing these models is.