Pulkit Mehta’s Post

View profile for Pulkit Mehta

Senior Consultant - Data Scientist at Firstsource Solutions Limited

This short course is a great introduction to RLHF. Highlights: 1. Instructor explained what is RLHF, how it is used in LLMs to align responses to human preferences. 2. How to prepare preference datasets and do reward modelling . 3. How to use reward model in RL Loop to do fine tuning of LLM . All of the above was demonstrated on GCP vertex platform using pipelines and llama2-7B model on summarization task . Next logical learning adventure would be to go through trl library from hugging face , going through blogs and their smol course https://lnkd.in/gBKpJ7vS #rl #rlhf #llm #finetuning #learninginpublic

To view or add a comment, sign in

Explore topics