Reward Models | Data Brew | Episode 40
Failed to add items
Sorry, we are unable to add the item because your shopping cart is already at capacity.
Add to basket failed.
Please try again later
Add to wishlist failed.
Please try again later
Remove from wishlist failed.
Please try again later
Adding to library failed
Please try again
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
By:
About this listen
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/
No reviews yet