- 2025/06/05: 🎉 We release our paper and codebase.
- 2025/09/26: 🎉 We update our paper and code.
- 2025/11/26: 🎉 Our paper was accepted by ICLR 2026.
Knowledgeable-R1 is a reinforcement-learning framework that explicitly trains large language models to use parametric knowledge (PK) to resist contextual interference while still exploiting external context when it is reliably helpful. We introduces a joint sampling scheme that generates paired responses with and without retrieval, and learns both local advantages (within each decoding regime) and global advantages under the same input to quantify when to ignore misleading context versus adopt it. We employ an asymmetric advantage transformation that amplifies exploratory behaviors toward parametric knowledge. Experiments show that Knowledgeable-R1 significantly improves robustness and reasoning accuracy in knowledge conflict scenarios and general RAG scenarios, outperforming SOTA baselines by 23% in counterfactual scenarios, and without degradation when the retrieved context is fully accurate.
🎯 Key Benefits:
- No additional cost — only the rollout strategy and RL objective is modified
- Easy to adopt — no additional components or complex multiple prompt pipelines are required in application
- Superior generalization — Knowledgeable-r1 significantly enhances robustness and reasoning accuracy in both parameters and contextual conflict tasks and general RAG tasks
The runtime environment is in the requirements.txt so you can
conda create -n knowledgeable-r1 python=3.11 -y && conda activate knowledgeable-r1
pip install -r requirements.txt
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whlDownload all dataset through this link. Unzip it under the folder of knowledgeable-R1.
Run the following command:
bash training_scripts/qwen2_5_7b_knowledge_confiqa_mc_grpo.sh bash training_scripts/qwen2_5_7b_knowledge_confiqa_mc_knowledgeable_r1.shbash eval_query_only.sh #for query only
bash eval_query_with_rag.sh #for RAG
bash eval_ours.sh
python get_metric.pyOur evaluation results can be found in this link.
🎯 Key Benefits:
- No additional cost — only the rollout strategy and RL objective is modified
- Easy to adopt — no additional components or complex agent pipelines are required in application
- Superior generalization — Knowledgeable-R1 significantly enhances robustness and reasoning accuracy in both parameters and contextual conflict tasks and general RAG tasks
If you find our works useful for your research, please consider citing:
@misc{lin2025resistingcontextualinterferencerag,
title={Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement},
author={Chenyu Lin and Yilin Wen and Du Su and Hexiang Tan and Fei Sun and Muhan Chen and Chenfu Bao and Zhonghou Lyu},
year={2025},
eprint={2506.05154},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.05154},
}- The training codes are built on EasyR1, and the evaluation suite employs vLLM for acceleration.
- The base models are from Qwen2.5-7B-Instruct,Llama-3.1-8B-Instruct,Qwen2.5-3B-Instructand Qwen2.5-14B-Instruct.
- The original training datasets are from ConFiQA and HotpotQA.
