
RL algorithm: from PPO to GRPO and DAPO
RL algorithm: from PPO to GRPO and DAPO
https://zhuanlan.zhihu.com/p/1898817630208517687$\text{Latex Example}\quad {\color{green}green}\quad {\color[rgb]{0.286,0.529,0.808} light-blue}\quad {\color[rgb]{0.553,0.133,0.537}purple} {\color[rgb]{0.820,0.208,0.208}\quad light- red}\quad {\color{brown}brown}$ Read more ›
More Articles
- conda, pip, docker: source
 - The State of Reinforcement Learning for LLM Reasoning (1)
 - A (Long) Peek into Reinforcement Learning: Part3
 - conda, pip, docker: source
 - A (Long) Peek into Reinforcement Learning: Part2
 - A (Long) Peek into Reinforcement Learning: Part1
 - Spinning Up: Part 3: Intro to Policy Optimization
 - Spinning Up: Part 2: Kinds of RL Algorithms
 - Spinning Up: Part 1 Key Concepts inb RL
 - Docker, sftp, conda
 - Post with Image and Caption
 - Paper List: RL-Strawberry
 - The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
 - NN-Z2H: Lecture 3: Building makemore Part 2: MLP
 - NN-Z2H: Lecture 2: The spelled-out intro to language modeling: building makemore
 - Show All Articles ›