分布式并行-Distributed Parallelsim

话题	回复	浏览量	时间点
关于“分布式并行-Distributed Parallelsim”类别	0	20	2025 年6 月 5 日
MindSpore Shard：深度解读算子级并行	0	31	2026 年1 月 31 日
MindSpore 分布式训练报错RuntimeError: HCCL AllReduce failed, device type of rank 0 is Ascend, rank 1 is CPU	1	15	2026 年1 月 30 日
MindSpore 分布式训练报错RuntimeError: HCCL AllReduce failed, device type of rank 0 is Ascend, rank 1 is CPU及解决	0	7	2026 年1 月 30 日
昇思MindSpore优化器并行、Zero还是FSDP？	0	26	2025 年12 月 10 日
微调qwen3-32B大模型，单机多卡信号同步失败 Sync run failed	0	24	2025 年11 月 21 日
MindFormers进行单机八卡调用时报错No parameter is entered. Notice that the program will run on default 8 cards.	0	25	2025 年9 月 24 日
单机4卡分布式推理失报错RuntimeError: Ascend kernel runtime initialization failed. The details refer to 'Ascend Error Message'.	0	9	2025 年10 月 24 日
MindSpore报错：wq.weight in the argument 'net' should have the same shape as wq.weight in the argument 'parameter_dict'.	0	9	2025 年7 月 25 日
模型并行显示内存溢出	0	13	2025 年10 月 23 日
MindSpore报错Please try to reduce 'batch_size' or check whether exists extra large shape.方法二	0	4	2025 年10 月 21 日
MindSpore模型Pipeline并行训练报错RuntimeError: Stage 0 should has at least 1 parameter. but got none.	0	15	2025 年10 月 11 日
流水线并行没开Cell共享导致编译时间很长	0	28	2025 年10 月 11 日
分布式模型并行报错：operator Mul init failed或者CheckStrategy failed.	0	13	2025 年10 月 11 日
MindSpore跑分布式报错TypeError: The parameters number of the function is 636, but the number of provided arguments is 635.	0	10	2025 年10 月 10 日
MindSpore模型Pipeline并行发现有些卡的log中loss为0	0	16	2025 年10 月 8 日
MTP使用多进程生成mindrecord，报错RuntimeError: Unexpected error. [Internal ERROR] Failed to write mindrecord meta files.	0	9	2025 年10 月 8 日
【案例】【Mindspore】【离线权重转换系列二】MindSpore分布式ckpt权重A转换为其他策略的分布式权重B	0	17	2025 年10 月 6 日
MindSpore8卡报Socket times out问题	0	14	2025 年10 月 6 日
并行策略为8:1:1时报错RuntimeError: May you need to check if the batch size etc. in your 'net' and 'parameter dict' are same.	0	7	2025 年10 月 4 日
docker执行报错：RuntimeError: Maybe you are trying to call 'mindspore.communication.init()' without using 'mpirun'	0	11	2025 年10 月 4 日
MindSpore大模型并行需要在对应的yaml里面做哪些配置	0	17	2025 年10 月 1 日
流水线并行报错Reshape op can't be a border.	0	11	2025 年9 月 30 日
MindSpore跑模型并行报错ValueError: array split does not result in an equal division	0	11	2025 年9 月 29 日
MindSpore训练大模型报错：BrokenPipeError: [Errno 32] Broken pipe, EOFError	0	11	2025 年9 月 29 日
MindSpore开启profile，使用并行策略报错ValueError: When dıstrıbuted loads are slıced we1ghts, sınk mode must be set True.	0	24	2025 年9 月 28 日
MindSpore SafeTensors 技术详解：高效模型存储与懒加载机制	0	101	2025 年9 月 17 日
Mindspore并行策略下hccl_tools工具使用报错	0	21	2025 年9 月 28 日
模型并行策略为 1:1:8 时报错RuntimeError: Stage num is 8 is not equal to stage used: 5	0	13	2025 年9 月 27 日
增加数据并行数之后模型占用显存增加	0	23	2025 年9 月 26 日