模型训练-Model Training

话题	回复	浏览量	时间点
关于“模型训练-Model Training”类别	0	28	2025 年6 月 5 日
模型转换失败Convert failed. Ret: Common error code.	1	9	2026 年4 月 18 日
MindSpore训练自定义Transformer模型时报错"RuntimeError: Memory allocation failed"	1	10	2026 年4 月 18 日
MindSpore训练模型时报错”RuntimeError: Memory allocation failed”分析和解决	0	10	2026 年4 月 18 日
MindSpore动态图模式下梯度计算报错AttributeError: module 'mindspore' has no attribute 'value_and_grad'	1	25	2026 年3 月 1 日
MindSpore分布式训练Qwen-7B时崩溃Graph execution failed.	1	25	2026 年3 月 1 日
MindSpore中使用Graph模式运行网络，首次运行非常慢，且输入Shape改变就会重新编译	1	30	2026 年1 月 30 日
MindSpore模型推理报错TypeError: cell_reuse() takes 0 positional arguments but 1 was given	0	22	2025 年10 月 7 日
使用MindSpore混合精度模式训练出现Loss NaN	1	48	2025 年10 月 28 日
MindSpore自定义算子梯度计算不正确Loss异常	1	30	2025 年10 月 27 日
MindSpore训练时报错：TypeError: For 'MatMul', the input data must be float16, float32, uint16 but got int32	1	32	2025 年10 月 27 日
MindSpore报错Please try to reduce 'batch_size' or check whether exists extra large shape.	0	26	2025 年10 月 21 日
Ascend环境运行mindspore脚本报：网络脚本的设备被占用，当前MindSpore框架在Ascend环境只支持每张卡运行一个网络脚本	0	29	2025 年10 月 21 日
MindSpore报RuntimeError:Primitive ScatterAddapos; bprop not defined	0	15	2025 年8 月 28 日
使用单卡Ascend910进行LLaMA2-7B推理,速度缓慢	1	40	2025 年10 月 13 日
MindSpore大模型打开pp并行或者梯度累积之后loss不溢出也不收敛	0	20	2025 年10 月 10 日
MindSpore大模型微调时报溢出及解决	0	20	2025 年10 月 9 日
【案例】【Mindspore】【MIndformer】训练plog报错halMemAlloc failed，drvRetCode=6	0	37	2025 年10 月 9 日
MindSpore训练异常中止：Try to send request before Open()、Try to get response before Open()、Response is empty	0	17	2025 年10 月 9 日
pangu-100b 2k集群线性度问题定位	0	14	2025 年10 月 8 日
MindSpore模型推理报错：memory isn't enough and alloc failed, kernel name: kernel_graph_@ HostDSActor, alloc size: 8192B	0	18	2025 年10 月 7 日
昇腾910FlashAttention适配alibi问题	0	32	2025 年10 月 7 日
【案例】【Mindspore】【离线权重转换系列三】MindSpore的ckpt格式完整权重和分布式权重互转	0	18	2025 年10 月 7 日
模型推理报错RuntimeError A model class needs to define a `prepare inputs fordgeneration` method in order to use .generate()`	0	22	2025 年10 月 6 日
昇腾910上CodeLlama推理报错get fail deviceLogicId[0]	0	11	2025 年10 月 5 日
llama3.1-8b的lora微调，不开启权重转换会导致维度不匹配，开启了之后会报错找不到rank1的ckpt，但是strategy目录里面是全的	0	14	2025 年10 月 5 日
baichuan2-13b算子溢出 loss跑飞问题和定位	0	14	2025 年10 月 5 日
MindSpore模型报错Reason: Memory resources are exhausted.	0	37	2025 年10 月 5 日
Llama推理报参数校验错误TypeError: The input value must be int. but got 'NoneType.	0	17	2025 年10 月 4 日
Mindformers模型启动时因为host侧OOM导致任务被kill	0	68	2025 年10 月 4 日