on-policy
Everything on Ground Truth tagged “on-policy” — 2 items.
Two new papers push 'on-policy distillation' to fix privileged teachers and merge specialist skills News
DOPD and MOPD advance on-policy distillation -- training a student on its own outputs -- with DOPD routing supervision to avoid a 'privilege illusion' and MOPD merging multiple specialist RL teachers into one model without cross-domain interference.
On-Policy vs Off-Policy Learning Lesson
On-policy learning trains a model on data generated by its own current behavior, while off-policy learning trains it on data generated by something else -- an old version, a different policy, or a fixed dataset -- and the choice shapes how stable, sample-efficient, and reliable the training is.