The ReWiND method, which consists of three phases: learning a reward function, pre-training, and using the reward function and pre-trained policy to learn a new language-specified task online.
In their paper ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations, which was presented at CoRL 2025, Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh A. Sontakke, Joseph J. Lim, Jesse Thomason, Erdem Bıyık and Jesse Zhang introduce a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. We asked Jiahui Zhang and Jesse Zhang to tell us more.
What is the topic of the research in your paper, and what problem were you aiming to solve?
Our research addresses the problem of enabling robot manipulation policies to solve novel, language-conditioned tasks without collecting new demonstrations for each task. We begin with a small set of demonstrations in the deployment environment, train a language-conditioned reward model on them, and then use that learned reward function to fine-tune the policy on unseen tasks, with no additional demonstrations required.
Tell us about ReWiND – what are the main features and contributions of this framework?
















