模型初始化和加载时间过长解决

kang_zl · 2025 年7 月 13 日 13:13

系统环境

硬件环境(Ascend/GPU/CPU): GPU
MindSpore版本: MindSpore=2.4、MindFormer=1.1.0
执行模式（PyNative/ Graph）: 不限
Python版本: Python=3.8
操作系统平台: linux

报错信息

问题描述

初始化模型和加载的代码如下：
其中model=GLAForCausalLM这一句代码要执行几百秒，时间很长。

脚本信息：

import mindspore as ms
from mindnlp.transformers importAutoTokenizer, AutoConfig
from gla_mindspore.model.modeling_glaimport GLAForCausalLM
from gla_mindspore.model.configuration_glaimport GLAConfig
import json
import mindspore.numpy as mnp
from time import time

ms. set_device("Ascend")

def load_gla_config(config_path):
    with open(config_path, "r") as f:
        config_data = json. load(f)
    return GLAConfig(**config_data)

# config = AutoConfig.from_pretrained('/home/HWHiAiUser/HUST-VL/GLA/flash-linear-attention/my_model/gla-1.3B-100B')
config = load_gla_config("/home/HwHiAiUser/HUST-VL/GLA/flash-linear-attention/my_model/gla-1. 3B-100B/config. json")
model = GLAForCausalLM(config)

根因分析

如果开静态图是mindspore.set_context(mode=0)这样。mindnlp模型静态图不是这样用的，需要对推理的代码打mindspore.jit标签；在上下文里面设置静态图，只能让mindspore.nn.Cell的子类construct方法中的计算逻辑走整个静态图，construct外部的代码无效，而mindnlp的模型实现没有使用nn.Cell类和其construct方法，construct外部的代码需要用mindspore.jit标签包起来才能走静态图；否则的话，没法把整个模型编译成静态图，框架可能会把每一次的api或者算子的调用都判断为一个独立的小图，这样会有大量的出图入图的额外开销，所以mindspore.set_context(mode=0)后，可能比默认的动态图模式慢了几十倍。

解决方案

mindnlp的静态图用法可以参考以下链接：
mindnlp的静态图

还有要注意，就算确实用起了静态图模式，也要确保模型的输入和输出shape是固定的，不然很有可能触发重编译，如果计算图重编译的话，每次推理加上编译的时间，速度也是会很慢的；

至于model=GLAForCausalLM速度很慢，如果你的模型是bin文件，通常会比较快一些，但如果是safetensor格式的话，通常比较慢；safetensor格式也分情况，要看存储介质，如果是在物理机的高性能ssd上加载，相对比较快一些。

话题		回复	浏览量
模型初始化和加载时间过长如何解决其他干货-Others	0	51	2025 年6 月 25 日
使用MindSpore静态图速度慢的问题模型训练-Model Training	0	35	2025 年9 月 1 日
MindSpore大模型在线推理速度慢及解决方案推理经验-Inference Experience	0	58	2025 年9 月 24 日
MindSpore报错RuntimeError: Load op info form json config failed, version: Ascend310，及解决推理经验-Inference Experience	0	27	2025 年8 月 1 日
MindSpore中使用Graph模式运行网络，首次运行非常慢，且输入Shape改变就会重新编译模型训练-Model Training	1	54	2026 年1 月 30 日