Skip to content

分布式训练

标签
AI
AI/教程
字数
657 字
阅读时间
4 分钟

现在已经不推荐使用 DataParallel[1]

rank

gpu

local_rank:进程阶序(rank)

global_rank

world_size:总进程数

torchrun

torch.distributed.launch

nproc_per_node

什么是 DDP?

DistributedDataParallel

NCCL

RDMA

NVML

torch.distributed

分布式计算

All-Reduce

Reduced ring

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. The GIL prevents race conditions and ensures thread safety. A nice explanation of how the Python GIL helps in these areas can be found here. In short, this mutex is necessary mainly because CPython's memory management is not thread-safe.

来源: GlobalInterpreterLock - Python Wiki

Distributed training with 🤗 Accelerate https://huggingface.co/docs/transformers/accelerate

TorchX — PyTorch/TorchX main documentation

pytorch/torchx: TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.

TorchElastic Kubernetes — PyTorch 2.1 documentation

shell
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/3
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 3 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name     | Type    | Params
-------------------------------------
0 | conv1    | Conv2d  | 320
1 | conv2    | Conv2d  | 18.5 K
2 | dropout1 | Dropout | 0
3 | dropout2 | Dropout | 0
4 | fc1      | Linear  | 1.2 M
5 | fc2      | Linear  | 1.3 K
-------------------------------------
1.2 M     Trainable params
0         Non-trainable params
1.2 M     Total params
4.800     Total estimated model params size (MB)
shell
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/3
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
shell
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/3
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

Pytorch DDP

Getting Started with Distributed Data Parallel — PyTorch Tutorials 2.2.0+cu121 documentationWriting Distributed Applications with PyTorch — PyTorch Tutorials 2.2.0+cu121 documentationSaving and Loading Models — PyTorch Tutorials 2.2.0+cu121 documentationTorch Distributed Elastic — PyTorch 2.1 documentationTorchElastic Kubernetes — PyTorch 2.1 documentationelastic/kubernetes at master · pytorch/elastic (github.com)Kubeflow Pipelines — PyTorch/TorchX main documentationTorchX — PyTorch/TorchX main documentationKubeflow Pipelines — PyTorch/TorchX main documentation

监控 Pytorch

Metrics — PyTorch 2.1 documentation

Trainer

Distributed training with 🤗 Accelerate (huggingface.co)

参考资料

贡献者

页面历史


  1. Getting Started with Distributed Data Parallel — PyTorch Tutorials 2.2.0+cu121 documentation ↩︎

撰写