Build the recipe — Qwen3-30B-A3B

Parallelism (DP × CP|SP × EP). In transformers MoE, EP shares the TP mesh — they're the same axis, so TP isn't a separate dim here. The full state space is precomputed at startup as a catalog of (ctx × nodes × base × feature-subset) entries — runtime is pure lookup. Click Export to download the catalog as JSONL.

Context length

—

Cluster size

—

Backend rules at this scale

—

batch / params

seq · ring

—

seq · ulysses

—

expert (= TP mesh)

Discovery timeline Apr 28

—

Window MFU

—%

causal-adj —%

Throughput

—

Peak GPU mem

—

Drag a base backend here. Each base owns a unique tab x-position; only dims with a matching notch fit.

Ready

Tab positions encode the base mesh each dim accepts. CP fits FSDP2, SP fits DS-Z3, EP fits FSDP2 or DS-Z2, TP fits any.

Palette0 of 12

base CP/SP EP attn kernel

Underlying training runs & logs: aminediroHF/qwen3-sft-benchmark · This builder is a viewer over a precomputed catalog generated by build_catalog.py.