32 private links
Group Relative Policy Optimization (GRPO) has proven to be a useful algorithm for training LLMs to reason and improve on benchmarks. DeepSeek-R1 showed that you can bootstrap a model through a combination of supervised fine-tuning and GRPO to compete with the state of the art models such as OpenAI's o1.
To learn more about how it works in practice, we wanted to try out some of the techniques on a real world task. This post will outline how to train your own custom small LLM using GRPO, your own
Bjarne Stroustrup wants standards body to respond to memory-safety push as Rust monsters lurk at the door
The new US administration has removed everything from the White House web site and fired most of the CISA people who worked on memory safety…
maintainability being the wrong value for indie games, as what we should strive for is iteration speed