Search: [grpo]

2732 shaares
32 private links

2732 shaares · 32 private links

Filters

Links per page

20 50 100

1 result tagged grpo

Training a Rust 1.5B Coder LM with Reinforcement Learning (GRPO) | Oxen.ai

Group Relative Policy Optimization (GRPO) has proven to be a useful algorithm for training LLMs to reason and improve on benchmarks. DeepSeek-R1 showed that you can bootstrap a model through a combination of supervised fine-tuning and GRPO to compete with the state of the art models such as OpenAI's o1.

To learn more about how it works in practice, we wanted to try out some of the techniques on a real world task. This post will outline how to train your own custom small LLM using GRPO, your own

AI · grpo · rust

March 8, 2025 at 11:20:15 AM EST * · permalink

https://www.oxen.ai/blog/training-a-rust-1-5b-coder-lm-with-reinforcement-learning-grpo

Filters

Links per page

20 50 100