arxiv:2312.09187

Vision-Language Models as a Source of Rewards

Published on Dec 14, 2023

· Submitted by

akhaliq on Dec 15, 2023

Upvote

Authors:

Feryal Behbahani ,

Harris Chan ,

Clare Lyle ,

Abstract

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.

View arXiv page View PDF Add to collection

Community

deepankar68

Dec 15, 2023

What's colours in this image

deepankar68

Dec 15, 2023

What's this image

comdab

Dec 15, 2023

comdab

Dec 15, 2023

Quelles sont ses races de chiens?

comdab

Dec 15, 2023

Peux-tu me dire de quelle race sont ces chiens

Lyte

Dec 16, 2023

this is not the page where you can try the model guys, this is a research paper. 😅

Lyte

Dec 16, 2023

@akhaliq sorry to bother you, but I noticed quite a few papers getting comments from people thinking that this is a place to try models, Unfortunately, this confusion seems to be growing and I think/suggest that there might be a need to add a little disclosure to make sure everyone understands that this is not the place to test the models.
for example, 2 pages on 15-Dec-2023 had these types of comments one of them being this and the other one is here