
RL algorithm: from PPO to GRPO and DAPO
RL algorithm: from PPO to GRPO and DAPO
https://zhuanlan.zhihu.com/p/1898817630208517687$\text{Latex Example}\quad {\color{green}green}\quad {\color[rgb]{0.286,0.529,0.808} light-blue}\quad {\color[rgb]{0.553,0.133,0.537}purple} {\color[rgb]{0.820,0.208,0.208}\quad light- red}\quad {\color{brown}brown}$ Read more ›
More Articles
- conda, pip, docker: source
- The State of Reinforcement Learning for LLM Reasoning (1)
- A (Long) Peek into Reinforcement Learning: Part3
- conda, pip, docker: source
- A (Long) Peek into Reinforcement Learning: Part2
- A (Long) Peek into Reinforcement Learning: Part1
- Spinning Up: Part 3: Intro to Policy Optimization
- Spinning Up: Part 2: Kinds of RL Algorithms
- Spinning Up: Part 1 Key Concepts inb RL
- Docker, sftp, conda
- Post with Image and Caption
- Paper List: RL-Strawberry
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
- NN-Z2H: Lecture 3: Building makemore Part 2: MLP
- NN-Z2H: Lecture 2: The spelled-out intro to language modeling: building makemore
- Show All Articles ›