模型转换失败:not support onnx data type IsNaN

使用的时候,保证转换得到的两个文件在同一个目录下即可,会自动的根据ms的路径获取msw的路径的,针对权重和模型结构分离的场景,建议在build的时候使用model path,不要使用buffer,使用buffer的话,这个buffer里面是不包含权重的

嗯嗯,麻烦大佬帮忙看下这里的报错该怎么处理呀:模型转换失败:not support onnx data type IsNaN - #19,来自 Fate_sky :sob:

[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.106.272 [heduler.cc:1534] ScheduleSubGraphToKernels] schedule node return nullptr, name: /model/rotary_emb/Sin, type: Sin

这个是sin算子没有找到,你导出的onnx中,这个算子的输入数据是fp16还是fp32?从目前的报错来看,这个是算子没有找到,算子找不到,一般两种情况,1、的确没有这个这个算子,但是sin算子是有的,所以排除这个原因;2、算子的数据类型是不支持的,所以也会导致找不到匹配的算子,现在从报错看,比较怀疑这个原因;因为在qwen的模型中rotary_emb那一块逻辑中存在复数,目前cpu算子不支持复数,所以建议确认一下这个问题;

嗯嗯,感觉像是数据类型不支持的原因。我这边其实是想在端侧跑通qwen模型,请问有啥最佳实践不?

当前不支持的这个数据类型,通过等价替换替换掉的,然后应该是可以让这个模型跑起来的

嗯嗯,大佬们有时间的话可以帮忙看下,这里先记录一下整体复现步骤:

  1. 模型下载

git clone https://huggingface.co/Qwen/Qwen2.5-0.5B
  1. 格式转换

文件:optimum\exporters\onnx\model_patcher.py (避免引入 IsNaN 算子)

# 原始代码:
# original_scaled_dot_product_attention = torch.nn.functional.scaled_dot_product_attention
# 替换后
def original_scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0,
        is_causal=False, scale=None, enable_gqa=False) -> torch.Tensor:
    L, S = query.size(-2), key.size(-2)
    scale_factor = 1 / math.sqrt(query.size(-1)) if scale is None else scale
    if attn_mask is not None:
        attn_bias = torch.zeros_like(attn_mask)
    else:
        attn_bias = torch.zeros(L, S, dtype=query.dtype, device=query.device)
    if is_causal:
        assert attn_mask is None
        temp_mask = torch.ones(L, S, dtype=torch.bool).tril(diagonal=0)
        attn_bias.masked_fill_(temp_mask.logical_not(), float("-inf"))
        attn_bias.to(query.dtype)

    if attn_mask is not None:
        if attn_mask.dtype == torch.bool:
            attn_bias.masked_fill_(attn_mask.logical_not(), float("-inf"))
        else:
            attn_bias = attn_mask + attn_bias

    if enable_gqa:
        key = key.repeat_interleave(query.size(-3)//key.size(-3), -3)
        value = value.repeat_interleave(query.size(-3)//value.size(-3), -3)

    attn_weight = query @ key.transpose(-2, -1) * scale_factor
    attn_weight += attn_bias
    attn_weight = torch.softmax(attn_weight, dim=-1)
    attn_weight = torch.dropout(attn_weight, dropout_p, train=True)
    return attn_weight @ value

指令:

optimum-cli export onnx --model Qwen2.5-0.5B --task text-generation onnx_model_optimum

生成model.onnx和model.onnx_data两个文件。

  1. 生成 ms

路径: ~/workspace/mindspore-lite-2.7.0-linux-x64/tools/converter/converter/

./converter_lite --fmk=ONNX \                     
--modelFile=model.onnx \         
--outputFile=qwen

生成qwen.ms和qwen.msw两个文件。

  1. 执行验证

文件: mindspore-lite/examples/quick_start_cpp/main.cc

# 原始代码:
# // Build model
# auto build_ret = model->Build(model_buf, size, mindspore::kMindIR, context);
# delete[](model_buf);
# 替换后
// Build model directly from file path
auto build_ret = model->Build(model_path, mindspore::kMindIR, context);

报错:

[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.106.235 [heduler.cc:1442] ScheduleNodeToKernel] FindBackendKernel return nullptr, name: /model/rotary_emb/Sin, type: Sin
[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.106.272 [heduler.cc:1534] ScheduleSubGraphToKernels] schedule node return nullptr, name: /model/rotary_emb/Sin, type: Sin
[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.106.280 [heduler.cc:1364] ScheduleMainSubGraphToKernels] Schedule subgraph failed, index: 0
[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.106.307 [heduler.cc:1485] ScheduleGraphToKernels] ScheduleSubGraphToSubGraphKernel failed
[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.106.485 [heduler.cc:390] Schedule] Schedule graph to kernels failed.
[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.106.494 [te_session.cc:604] CompileGraph] Schedule kernels failed: -1
[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.106.761 [te_session.cc:2110] LoadModelAndCompileByPath] Compile model failed
[ERROR] ME(3029707,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:03.107.411 [x_api/model/model_impl.cc:237] Build] Init session failed
[ERROR] ME(3029706,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:04.505.408 [heduler.cc:1442] ScheduleNodeToKernel] FindBackendKernel return nullptr, name: /model/rotary_emb/Sin, type: Sin
[ERROR] ME(3029706,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:04.505.451 [heduler.cc:1534] ScheduleSubGraphToKernels] schedule node return nullptr, name: /model/rotary_emb/Sin, type: Sin
[ERROR] ME(3029706,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:04.505.460 [heduler.cc:1364] ScheduleMainSubGraphToKernels] Schedule subgraph failed, index: 0
[ERROR] ME(3029706,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:04.505.488 [heduler.cc:1485] ScheduleGraphToKernels] ScheduleSubGraphToSubGraphKernel failed
[ERROR] ME(3029706,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:04.505.660 [heduler.cc:390] Schedule] Schedule graph to kernels failed.
[ERROR] ME(3029706,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:04.505.666 [te_session.cc:604] CompileGraph] Schedule kernels failed: -1
[ERROR] ME(3029706,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:04.505.923 [te_session.cc:2110] LoadModelAndCompileByPath] Compile model failed
[ERROR] ME(3029706,7f0deed37e80,mindspore_quick_start_cpp):2025-09-30-20:04:04.506.590 [x_api/model/model_impl.cc:237] Build] Init session failed

另外,我在HF上看到有Qwen对应的MNN版本:https://huggingface.co/taobao-mnn/Qwen2.5-0.5B-Instruct-MNN
咱们有对应的Qwen-xxx-MS版本不?

请问基于MindSpore Lite跑通qwen,目标的场景和部署设备是?

当前MindSpore Lite开发团队已在尝试支持qwen推理,原始transformer网络有部分算子需要调整一下,这个我们可以一起交流下。

image

Lite相关的问题,建议挪到这个下面哈,方便后面交流分析