|
关于“分布式并行-Distributed Parallelsim”类别
|
|
0
|
10
|
June 5, 2025
|
|
昇思MindSpore优化器并行、Zero还是FSDP?
|
|
0
|
3
|
December 10, 2025
|
|
微调qwen3-32B大模型,单机多卡信号同步失败 Sync run failed
|
|
0
|
17
|
November 21, 2025
|
|
MindFormers进行单机八卡调用时报错No parameter is entered. Notice that the program will run on default 8 cards.
|
|
0
|
21
|
September 24, 2025
|
|
单机4卡分布式推理失报错RuntimeError: Ascend kernel runtime initialization failed. The details refer to 'Ascend Error Message'.
|
|
0
|
8
|
October 24, 2025
|
|
MindSpore报错:wq.weight in the argument 'net' should have the same shape as wq.weight in the argument 'parameter_dict'.
|
|
0
|
9
|
July 25, 2025
|
|
模型并行显示内存溢出
|
|
0
|
11
|
October 23, 2025
|
|
MindSpore报错Please try to reduce 'batch_size' or check whether exists extra large shape.方法二
|
|
0
|
4
|
October 21, 2025
|
|
MindSpore模型Pipeline并行训练报错RuntimeError: Stage 0 should has at least 1 parameter. but got none.
|
|
0
|
13
|
October 11, 2025
|
|
流水线并行没开Cell共享导致编译时间很长
|
|
0
|
21
|
October 11, 2025
|
|
分布式模型并行报错:operator Mul init failed或者CheckStrategy failed.
|
|
0
|
13
|
October 11, 2025
|
|
MindSpore跑分布式报错TypeError: The parameters number of the function is 636, but the number of provided arguments is 635.
|
|
0
|
10
|
October 10, 2025
|
|
MindSpore模型Pipeline并行发现有些卡的log中loss为0
|
|
0
|
15
|
October 8, 2025
|
|
MTP使用多进程生成mindrecord,报错RuntimeError: Unexpected error. [Internal ERROR] Failed to write mindrecord meta files.
|
|
0
|
9
|
October 8, 2025
|
|
【案例】【Mindspore】【离线权重转换系列二】MindSpore分布式ckpt权重A转换为其他策略的分布式权重B
|
|
0
|
13
|
October 6, 2025
|
|
MindSpore8卡报Socket times out问题
|
|
0
|
11
|
October 6, 2025
|
|
并行策略为8:1:1时报错RuntimeError: May you need to check if the batch size etc. in your 'net' and 'parameter dict' are same.
|
|
0
|
6
|
October 4, 2025
|
|
docker执行报错:RuntimeError: Maybe you are trying to call 'mindspore.communication.init()' without using 'mpirun'
|
|
0
|
10
|
October 4, 2025
|
|
MindSpore大模型并行需要在对应的yaml里面做哪些配置
|
|
0
|
10
|
October 1, 2025
|
|
流水线并行报错Reshape op can't be a border.
|
|
0
|
9
|
September 30, 2025
|
|
MindSpore跑模型并行报错ValueError: array split does not result in an equal division
|
|
0
|
9
|
September 29, 2025
|
|
MindSpore训练大模型报错:BrokenPipeError: [Errno 32] Broken pipe, EOFError
|
|
0
|
7
|
September 29, 2025
|
|
MindSpore开启profile,使用并行策略报错ValueError: When dıstrıbuted loads are slıced we1ghts, sınk mode must be set True.
|
|
0
|
12
|
September 28, 2025
|
|
MindSpore SafeTensors 技术详解:高效模型存储与懒加载机制
|
|
0
|
84
|
September 17, 2025
|
|
Mindspore并行策略下hccl_tools工具使用报错
|
|
0
|
14
|
September 28, 2025
|
|
模型并行策略为 1:1:8 时报错RuntimeError: Stage num is 8 is not equal to stage used: 5
|
|
0
|
12
|
September 27, 2025
|
|
增加数据并行数之后模型占用显存增加
|
|
0
|
16
|
September 26, 2025
|
|
MindSpore分布式并行报错The strategy is XXX, shape XXX cannot be divisible by strategy value XXX
|
|
0
|
11
|
September 25, 2025
|
|
MindSpore分布式8节点报错Call GE RunGraphWithStreamAsync Failed, ret is: 4294967295
|
|
0
|
13
|
September 24, 2025
|
|
MindSpore+MindFormer-r.1.2.0微调qwen1.5 报错
|
|
1
|
33
|
August 3, 2025
|