Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

🔍 Let's dive in

Simon Willison tested two newly released models—Alibaba's Qwen3.6-35B-A3B and Anthropic's Claude Opus 4.7—using his informal "pelican riding a bicycle" benchmark for image generation. Qwen3.6-35B-A3B, a 20.9GB quantized model running locally on a MacBook Pro, produced superior SVG illustrations compared to Claude Opus 4.7 in both pelican and flamingo test cases. Willison notes the result is surprising given the proprietary model's expected capabilities, though he emphasizes the benchmark is primarily a humorous commentary on the absurdity of model comparison.

Lead coverage: Simon Willison — Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 ↗

🕰 The timeline · 1 source

Simon Willison first-party · 5d ago · 2/5

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 ↗

Simon Willison tested two newly released models—Alibaba's Qwen3.6-35B-A3B and Anthropic's Claude Opus 4.7—using his informal "pelican riding a bicycle" benchmark for image generation. Qwen3.6-35B-A3B, a 20.9GB quantized model running locally on a MacBook Pro, produced superior SVG illustrations compared to Claude Opus 4.7 in both pelican and flamingo test cases. Willison notes the result is surprising given the proprietary model's expected capabilities, though he emphasizes the benchmark is primarily a humorous commentary on the absurdity of model comparison.

I'm giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!

— Simon Willison

The pelican benchmark has always been meant as a joke—it's mainly a statement on how obtuse and absurd the task of comparing these models is.

— Simon Willison

🏷 Tags

Claude

🔧 Debug

Cluster ID: 24e1a9e9e3
Importance (max): 2
Members: 1
Sources: Simon Willison
Earliest: 2026-04-16T17:16:52.000Z
Latest: 2026-04-16T17:16:52.000Z
Lead URL: https://simonwillison.net/2026/Apr/16/qwen-beats-opus