Parallelism (DP × CP|SP × EP). In transformers MoE, EP shares the TP mesh — they're the same axis, so TP isn't a separate dim here. The full state space is precomputed at startup as a catalog of (ctx × nodes × base × feature-subset) entries — runtime is pure lookup. Click Export to download the catalog as JSONL.
Context length
—
Cluster size
—
Backend rules at this scale
—
DP
16
batch / params
CP
1
seq · ring
—
SP
1
seq · ulysses
—
EP
1
expert (= TP mesh)
Discovery timelineApr 28
—
Window MFU
—%
causal-adj —%
Throughput
—
—
Peak GPU mem
—
—
Drag a base backend here. Each base owns a unique tab x-position; only dims with a matching notch fit.
Ready
Tab positions encode the base mesh each dim accepts. CP fits FSDP2, SP fits DS-Z3, EP fits FSDP2 or DS-Z2, TP fits any.