Awesome-RLHF-Vision

Introduction

Welcome to the Awesome-RLHF-Vision repository! This is a curated collection of research papers focusing on Reinforcement Learning with Human Feedback (RLHF) as applied to vision models. Our goal is to create a comprehensive resource that is continuously updated to track the latest advancements in this exciting field.

⭐ If you find this repository useful or interesting, please consider giving it a star to show your support! ⭐

Research Papers

Here you can find a list of research papers categorized by topics related to RLHF for vision models.

format:
- [title](paper link) [links]
  - author1, author2, and author3...
  - publisher
  - keyword
  - summary
  - code
  - experiment environments and datasets

Papers

2024

2023

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
- Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua
- CVPR 2024
- Keyword: Multimodal Understanding, Hallucination, Dense Direct Preference Optimization, Human-annotated, MiniCPM
- Code: Official
Diffusion Model Alignment Using Direct Preference Optimization
- Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik
- Keyword: Image Generation, Image-to-Image Editing, DPO, SDXL-1.0
- Code: Official
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
- Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, Omer Levy
- NeurIPS 2023
- Keyword: Image Generation, Reward Model, Human Preference Dataset
- Code: Official
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
- Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, Yuxiao Dong
- NeurIPS 2023
- Keyword: Image Generation, Reward Model, Reward Feedback Learning
- Code: Official
Aligning text-to-image models using human feedback
- Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu
- NeurIPS 2023
- Keyword: Image Generation, Reward Model, Reward-weighted Learning
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference
- Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng Li
- ICCV 2023
- Keyword: Image Generation, Reward Model (human preference classifier), Preference Tuning
- Code: Official
Rich Human Feedback for Text-to-Image Generation
- Youwei Liang, Junfeng He, Gang Li, Peizhao Li, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvijotham, Katie Collins, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, Vidhya Navalpakkam
- CVPR 2024 Best Paper
- Keyword: Image Generation, High-resolution, Reward/Critic Model, Fine-grained Feedback Signals, RFT
- Code: Official

Datasets

RLHF-V-Dataset
- OpenBMB
- Keyword: Human Preference, Rankings, Trustworthiness
- Task: Diverse multimodal understanding tasks
RLAIF-V-Dataset
- OpenBMB
- Keyword: AI Preference, Rankings, Trustworthiness
- Task: Diverse multimodal understanding tasks
Picka-Pic-v1
- Yuval Kirstain and Adam Polyak and Uriel Singer and Shahbuland Matiana and Joe Penna and Omer Levy
- Keyword: Human Preference, Rankings
- Task: Image Generation
Picka-Pic-v2
- Yuval Kirstain and Adam Polyak and Uriel Singer and Shahbuland Matiana and Joe Penna and Omer Levy
- Keyword: Human Preference, Rankings
- Task: Image Generation
ImageRewardDB
- THUDM
- Keyword: Human Preference, Rankings
- Task: Image Generation
Simulacra Aesthetic Captions
- John David Pressman and Katherine Crowson and Simulacra Captions Contributors
- Keyword: Human Preference, Ratings
- Task: Image Generation
HPS
- Xiaoshi Wu and Keqiang Sun and Feng Zhu and Rui Zhao and Hongsheng Li
- Keyword: Human Preference, Ratings
- Task: Image Generation
RichHF-18k
- Google Research
- Keyword: Human Preference, Ratings, Human-labeled Heatmaps (e.g., artifact regions of distorted pixels) and Misalignment Tokens in Prompts
- Task: Image Generation

Note: This list is continually updated. Make sure to check back regularly for the most recent papers.

Contributing

We welcome contributions from the community! If you have a paper or resource you'd like to add, feel free to submit a pull request or open an issue. Please follow our contribution guidelines for details.

License

This repository is licensed under the MIT License. See the LICENSE file for more information.

Acknowledgements

We would like to thank the contributors and researchers whose efforts have made this compilation possible.

Contact

For any questions or suggestions, feel free to open an issue or contact us directly ([email protected]). We appreciate your feedback!

Thank you for visiting Awesome-RLHF-Vision! Happy reading and researching!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
contribution_guidelines.md		contribution_guidelines.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-RLHF-Vision

Introduction

⭐ If you find this repository useful or interesting, please consider giving it a star to show your support! ⭐

Table of Contents

Research Papers

Papers

2024

2023

Datasets

Contributing

License

Acknowledgements

Contact

About

Releases

Packages

License

chengjl19/Awesome-RLHF-Vision

Folders and files

Latest commit

History

Repository files navigation

Awesome-RLHF-Vision

Introduction

⭐ If you find this repository useful or interesting, please consider giving it a star to show your support! ⭐

Table of Contents

Research Papers

Papers

2024

2023

Datasets

Contributing

License

Acknowledgements

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages