
Liang Qiu 邱亮
(pronounced as Leon Chew)
Hi! My name is Liang Qiu. I’m a researcher focusing on language model alignment and reinforcement learning. I was a Senior Applied Scientist at Amazon and earned my Ph.D. in Electrical and Computer Engineering from UCLA, advised by Prof. Song-Chun Zhu and Prof. Achuta Kadambi.
-
[09/2025] Our paper "Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models" has been accepted by NeurIPS, 2025.
-
[09/2025] Our paper "Ask a Strong LLM Judge when Your Reward Model is Uncertain" has been accepted by NeurIPS, 2025.
-
[08/2025] Our paper "WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning" has been accepted by EMNLP, 2025.
-
[07/2025] Our paper "Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only" has been accepted by COLM, 2025.
-
[05/2025] Our paper "Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data" has been accepted by ICML 2025.






















