Explore for some Topic about Distribute RL and HPC

RL Explore

RLlib

Challenge
- Because of the absence of a single dominant computational pattern or fundamental rules of composition, the design and implementation of RL algorithms is hard for researchers.
Insight
- Irregularity of RL training workloads Modern RL algorithms are highly irregular in the computation patterns they create
  1. The duration and resource requirements of tasks differ by orders of magnitude depending on the algorithm
  2. Communication patterns vary
  3. Nested computations are generated by model-based hybrid algorithms
  4. Maintain and update substantial amounts of state
Contribution
- Distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks.
- These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse.
See more details in RLLIB

Distributed RL (Algorithm)

Distributed Training

https://github.com/PaddlePaddle/PARL/blob/develop/papers/archive.md#distributed-training

Overview

DISTRIBUTED DEEP REINFORCEMENT LEARNING

Muti-Agent RL

MARLLIB: Extending Rllib For Multi-agent Reinforcement Learning

https://arxiv.org/abs/2210.13708

MARLlib manages to unify tens of algorithms, including different types of independent learning, centralized critic, and value decomposition methods; this leads to a highly composable integration of MARL algorithms that are not possible to unify before.
Furthermore, MARLlib goes beyond current work by integrating diverse environment interfaces and providing flexible parameter sharing strategies; this allows to create versatile solutions to cooperative, competitive, and mixed tasks with minimal code modifications for end users.
A plethora of experiments are conducted to substantiate the correctness of our implementation, based on which we further derive new insights on the relationship between the performance and the design of algorithmic components.

Exploration in Deep Reinforcement Learning: From Single-Agent to Multi-Agent Domain

https://arxiv.org/abs/2109.06668

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

https://arxiv.org/abs/2212.00253

In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning.
Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions.
By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games.

RL for HPC

DRAS-CQSim: A Reinforcement Learning based Framework for HPC Cluster Scheduling

https://arxiv.org/abs/2105.07526

However, the increasingly complex HPC systems combined with highly diverse workloads make such manual process challenging, time-consuming, and error-prone. We present a reinforcement learning based HPC scheduling framework named DRAS-CQSim to automatically learn optimal scheduling policy.
DRAS-CQSim encapsulates simulation environments, agents, hyperparameter tuning options, and different reinforcement learning algorithms, which allows the system administrators to quickly obtain customized scheduling policies.

Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning

https://arxiv.org/abs/2301.01494

In large-scale DDNN on HPC clusters, I/O performance is critical because it is becoming a bottleneck. Most flagship-class HPC clusters have hierarchical storage systems. For designing future HPC storage systems, it is necessary to quantify the performance improvement effect of the hierarchical storage system on the workloads.
This paper demonstrates the quantitative performance analysis of the hierarchical storage system for DDNN workload in a flagship-class supercomputer. Our analysis shows how much performance improvement and volume increment of the storage will be required to meet the performance goal.

Review, Analysis and Design of a Comprehensive Deep Reinforcement Learning Framework

https://arxiv.org/abs/2002.11883

This paper proposes a comprehensive software framework that not only plays a vital role in designing a connect-the-dots deep RL architecture but also provides a guideline to develop a realistic RL application in a short time span. We have designed and developed a deep RL-based software framework that strictly ensures flexibility, robustness, and scalability. By inheriting the proposed architecture, software managers can foresee any challenges when designing a deep RL-based system.
As a result, they can expedite the design process and actively control every stage of software development, which is especially critical in agile development environments. To enforce generalization, the proposed architecture does not depend on a specific RL algorithm, a network configuration, the number of agents, or the type of agents.
Using our framework, software developers can develop and integrate new RL algorithms or new types of agents, and can flexibly change network configuration or the number of agents.

Other RL

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

https://agarwl.github.io/reincarnating_rl/ https://arxiv.org/pdf/2206.01626.pdf https://ai.googleblog.com/2022/11/beyond-tabula-rasa-reincarnating.html

To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another. As a step towards enabling reincarnating RL from any agent to any other agent, we focus on the specific setting of efficiently transferring an existing sub-optimal policy to a standalone value-based RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations. Equipped with this algorithm, we demonstrate reincarnating RL’s gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. Overall, this work argues for an alternative approach to RL research, which we believe could significantly improve real-world RL adoption and help democratize it further.

‹»Distributed Deep Reinforcement Learning« »RLlib: Abstractions for Distributed Reinforcement Learning«›