Starting with MoE (Qwen3.6-35B-A3B) borrowing from Artificial Analysis, the good news (with multi-turn coordination, long horizon and hallucination-proofing):

Fot Tau2-Bench Qwen3.6 35B is actually 1% better than 27B
For Terminal-Bench Hard, Qwen3.6 35B and 27B are at the same performance level
For AA-Omniscience Non-Hallucination, Qwen3.6 35B and 27B are at the same performance level

The not so good news (with science, constraints/formatting, and long documentation reading):

SciCode has 2% difference between Qwen3.6-27B and Qwen3.5-35B, since 3.6 got overtrained and lost 2% relative to 3.5
For IFBench it has has 3% difference between Qwen3.5 27B and 35B (Qwen3.6 is overtrained with 8-9% loss)
For AA-LCR there is a 5% difference between Qwen3.6 27B and 35B

Qwen3.5-35B-A3B and Qwen3.6-35B-A3B are good to keep since the benchmarks are so tight in performance, that the MoE speed gains relative to dense linear attention are worth it. But then they would need to be compatible with PFlash+DFlash+DDTree when it comes to bottlenecks.

With Nemotron, the trend of IFBench (prompt adherance) dominating over agentic is interesting:

On IFBench Nemotron-Cascade-2-30B-A3B beat Qwen3.5-27B by 4%, while Nemotron-3-Nano beat Qwen-3.6-27B by 3% (we know that 3.6 is overtrained to not follow instructions as well as they should with comparing dense vs MoE finetuning)
Nemotron-Nano-4B Beat Qwen3.5-4B ONLY in IFBench by 6% (SciCode was a tie, everything else is no-contest), so on the 4B side of things if we stick to modular instruction tasks, they should be doable
With AA-Omniscience Non-Hallucination Nemotron-Nano-9B-V2 is at 39% while Qwen3.5 9B is at 19% (Qwen3.5 is better in all other benchmarks including IFBench), if there are use case where we just want smaller models to not confabulate, Nemotron could have a special use case there

Granite-4.0 and Falcon-H1R weaker than these two for everything.

How is Qwen3.6-35B-A3B with PFlash+DFlash+DDTree? #161

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions