NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Published in arXiv preprint, 2025

This paper introduces NOVER, a novel approach for incentive training of language models using verifier-free reinforcement learning. The method eliminates the need for external verifiers while maintaining effective training signals for improving model performance.

The work addresses the challenge of training language models with reinforcement learning without requiring expensive or complex verification mechanisms.

Recommended citation: W Liu, S Qi, X Wang, C Qian, Y Du, Y He. (2025). "NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning." arXiv preprint arXiv:2505.16022.
Download Paper