MindSpore Lite模型加载报错RuntimeError: build from file failed! Error is Common error code.

1 系统环境

硬件环境(Ascend/GPU/CPU): Ascend
MindSpore版本: 2.1
执行模式(PyNative/ Graph): 不限

2 报错信息

2.1 问题描述

使用MindSpore Lite增量推理,在加载模型时报错。

2.2 报错信息

[TRACE] GE(334540, python) :2023-10-13-18:54:37.246.214 [status:STOP] [ge_api.cc :316]334540 GEInitializeImpl :GEInitialize finished  
[ERROR] ME (334540, ffff887bdbe0, python) :2023-10-13-18:55:54.973.902 [mindspore/lite/src/extendrt/c xx_api/model/model.cc:101] Build] Catch exception: The pointer[mngl is null.  
- Framework Unexpected Exception Raised:  
This exception is caused by framework's unexpected error. Please create an issue at https://gitee.com/mindspore/mindspo re/issues to get help.  
- C++ Call Stack: (For framework developers)  
mindspore/core/ir/func_g raph.cc:352 free_variables_total  
Traceback (most recent call last):  
    File "lite chat.py", line 206, in modules  
        chat(tokenizer file, prefill mindir_path, decode mindi r_path, lite_config, device_id)  
    File "lite chat.py", line 75, in load model  
        model0.build from file(model pathO, mslite.ModelType. MINDIR, context, config file)  
    File "/home/miniconda3/lib/python3.7/site-packages/mindspore_lite/model.py, line 95, in warpper  
        return func(*args, **kwargs)  
    File "/home/miniconda3/lib/python3. 7/site-packages/mindspore_lite/model .py", line 235, in build_from_file  
        raise RuntimeError(- f"build from file - failed! Error is - (ret.Tostring()}")  
RuntimeError: build from file failed! Error is Common error code.  
[TRACEI GE(334540,python):2023-10-13-18:55:55.355.775 [status:INIT] [ge api .cc:3621334540 GEFinalize: GEFinalize start  
[TRACEI GE(334540,python):2023-10-13-18:55:55.355.910 [status :RUNNING] Tge_api.cc:3731334540 GEFinalize:Finalizing environment  
[TRACE] GE(334540,python):2023-10-13- 18:55:58.021.147 [status:SToP] [ge_api .cc:4011334540 GEFinalize:GEFinalize finished

3 根因分析

通过设置日志等级

export ASCEND_SLOG_PRINT_TO_STDOUT=1  
export ASCEND_GLOBAL_LOG_LEVEL=1

确定报错的原因,发现是内存不足。

DewMemAllacHgePageManaged:[LOAD][LOAD][drv ap1] halWemAlloc failed: sLze=1572400(Byte), type=2, modJleId=45, drvFlag=3242591731706905600, drwRetCode=6!

4 解决方案

经过实验发现,对于mindspore2.2之前的版本,通过将环境变量MS_GE_TRAIN去除,或者将MS_GE_TRAIN设置为0,可以成功加载。