skip to content
Reliquary
dashboard
research
roadmap
docs
source
menu
Forge · live training · team only · Reliquary
forge · live training
5H47sFL6-base-reset-qwen35
·
step 16,287
live
▸ open in wandb
status
running
runtime
19d 20h
last seen
—
loss
8.89e-4
kl
8.35e-4
grad_norm
2.484
reward μ
0.5781
steps / h
34
target met
lr
4.71e-6
gpu util
0.0%
gpu mem
0.0%
ai advisor
reading the last 160 points…
model quality
computing quality signals…
validator rejections
tailing validator logs over ssh…
PPO loss
primary objective
KL divergence
budget kl_beta = 0.04
grad_norm
clip @ 1
learning rate
cosine schedule
rewards
mean ± std
degenerate-group ratio
zero-variance reward groups
valid rollout ratio
GRAIL accepted / submitted
model improvement · checkpoint evals
held-out pass@1 · math + code
model-improvement evals
checkpoint benchmarks stream once the eval pipeline publishes to R2
gpu util
0.0%
gpu mem
0.0%
sm occupancy
0.0%
gpu temp
28°C
power
116 W
run config
11 keys · hide
b_batch
8
grad_clip_norm
1
kl_beta
0.04
learning_rate
0.000005
lr_cosine_max_windows
10000
lr_warmup_windows
10
m_rollouts_per_prompt
8
ppo_clip_epsilon
0.2
reliquary_version
0.1.0
wandb_training_version
v1
window_length
5