rlhf

Everything on Ground Truth tagged “rlhf” — 3 items.

A wave of new methods trains AI without a human answer key News

Several research groups landed on the same idea at once - improve a model by learning from its own attempts instead of expensive human labels - and the field is debating whether it really removes the labeling burden or just hides it.

Your AI judge might be reliable — and still be wrong News

The largest audit of AI language model judges to date — 21 judges, over half a million grading decisions — finds that standard reliability metrics are inflated by roughly a third, that the same judge can score differently on different benchmarks, and that high consistency and severe bias can coexist in the same system.

Reward-based fine-tuning (RLHF and RLVR) Lesson

After a model is first trained, it gets "polished" by rewarding good answers. Here's what that phase is, why it works, and the failure mode where models get repetitive and dull.