经验分享 Tech Blogs

话题	浏览量	时间点
MindSpore8卡报Socket times out问题分布式并行-Distributed Parallelsim	14	2025 年10 月 6 日
盘古-智子38B在昇腾910上，greedy模式下无法固定输出功能调试-Function Debugging	37	2025 年10 月 5 日
昇腾910上CodeLlama推理报错get fail deviceLogicId[0] 模型训练-Model Training	10	2025 年10 月 5 日
llama3.1-8b的lora微调，不开启权重转换会导致维度不匹配，开启了之后会报错找不到rank1的ckpt，但是strategy目录里面是全的模型训练-Model Training	13	2025 年10 月 5 日
baichuan2-13b算子溢出 loss跑飞问题和定位模型训练-Model Training	13	2025 年10 月 5 日
MindSpore模型报错Reason: Memory resources are exhausted. 模型训练-Model Training	33	2025 年10 月 5 日
Llama推理报参数校验错误TypeError: The input value must be int. but got 'NoneType. 模型训练-Model Training	16	2025 年10 月 4 日
并行策略为8:1:1时报错RuntimeError: May you need to check if the batch size etc. in your 'net' and 'parameter dict' are same. 分布式并行-Distributed Parallelsim	14	2025 年10 月 4 日
docker执行报错：RuntimeError: Maybe you are trying to call 'mindspore.communication.init()' without using 'mpirun' 分布式并行-Distributed Parallelsim	29	2025 年10 月 4 日
Mindformers模型启动时因为host侧OOM导致任务被kill 模型训练-Model Training	65	2025 年10 月 4 日
MindSpore开启MS_DISABLE_REF_MODE导致报错The device address type is wrong:type name in address:CPU,type name in context:Ascend 安装经验-Installation Experience	27	2025 年10 月 3 日
MTP Ascend910切换不同型号设备报错：KeyError：‘group_list’ 功能调试-Function Debugging	9	2025 年10 月 3 日
MindSpore保存模型提示：need to checkwhether you is batch size and so on in the 'net' and 'parameter dict' are same. 安装经验-Installation Experience	22	2025 年10 月 3 日
NFS上生成mindrecord报错Failed to write mindrecord meta files 功能调试-Function Debugging	10	2025 年10 月 3 日
Transformers报错google.protobuf.message.DecodeError: Wrong wire type in tag. 功能调试-Function Debugging	18	2025 年10 月 3 日
Tokenizer指向报错TypeError GPT2Tokenizer: __init__ () missing 2 required positional arguments: 'vocab_file' and "merges_file 模型训练-Model Training	12	2025 年10 月 2 日
MindSpore2.2.10 ge图模式报错: Current execute mode is KernelByKernel, the processes must be launched with OpenMPI or ... 模型训练-Model Training	17	2025 年10 月 2 日
MindSpore2.2.10使用Flash attention特性报错AttributeError: module 'mindspore.nn'has no attribute 'FlashAttention' 安装经验-Installation Experience	32	2025 年10 月 2 日
MTP数据集分布式读写锁死，Failed to execute the sql [SELECT NAME from SHARD NAME;] while verifying meta file, database is locked] 数据加载及处理-Data Loading&Processing	9	2025 年10 月 2 日
llama2模型转换报错ImportError: cannot import name 'swap_cache' from 'mindspore._c_expression' 功能调试-Function Debugging	18	2025 年10 月 2 日
【案例】【Mindspore】【离线权重转换系列0】MindSpore的离线权重转换接口说明及转换过程模型训练-Model Training	59	2025 年10 月 1 日
MindSpore开启profiler功能报错IndexError:list index out of range 安装经验-Installation Experience	15	2025 年10 月 1 日
Ascend910环境分离部署时请求超时安装经验-Installation Experience	26	2025 年10 月 1 日
MindSpore大模型并行需要在对应的yaml里面做哪些配置分布式并行-Distributed Parallelsim	30	2025 年10 月 1 日
MindSpore报错：TypeError: Multiply values for specific argument: query_embeds 模型训练-Model Training	18	2025 年10 月 1 日
流水线并行报错Reshape op can't be a border. 分布式并行-Distributed Parallelsim	19	2025 年9 月 30 日
日志显示没有成功加载预训练模型：model built, but weights is unloaded, since the config has no attribute or is None. 功能调试-Function Debugging	16	2025 年9 月 30 日
model.train报错Exception in training: The input value must be int and must > 0, but got '0' with type 'int'. 数据加载及处理-Data Loading&Processing	18	2025 年9 月 30 日
mindformers进行Lora微调后的权重合并模型训练-Model Training	38	2025 年9 月 30 日
MindSpore跑模型并行报错ValueError: array split does not result in an equal division 分布式并行-Distributed Parallelsim	17	2025 年9 月 29 日