1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend
MindSpore版本: 2.1
执行模式(PyNative/ Graph): 不限
2 报错信息
2.1 问题描述
使用MindSpore Lite增量推理,在加载模型时报错。
2.2 报错信息
[TRACE] GE(334540, python) :2023-10-13-18:54:37.246.214 [status:STOP] [ge_api.cc :316]334540 GEInitializeImpl :GEInitialize finished
[ERROR] ME (334540, ffff887bdbe0, python) :2023-10-13-18:55:54.973.902 [mindspore/lite/src/extendrt/c xx_api/model/model.cc:101] Build] Catch exception: The pointer[mngl is null.
- Framework Unexpected Exception Raised:
This exception is caused by framework's unexpected error. Please create an issue at https://gitee.com/mindspore/mindspo re/issues to get help.
- C++ Call Stack: (For framework developers)
mindspore/core/ir/func_g raph.cc:352 free_variables_total
Traceback (most recent call last):
File "lite chat.py", line 206, in modules
chat(tokenizer file, prefill mindir_path, decode mindi r_path, lite_config, device_id)
File "lite chat.py", line 75, in load model
model0.build from file(model pathO, mslite.ModelType. MINDIR, context, config file)
File "/home/miniconda3/lib/python3.7/site-packages/mindspore_lite/model.py, line 95, in warpper
return func(*args, **kwargs)
File "/home/miniconda3/lib/python3. 7/site-packages/mindspore_lite/model .py", line 235, in build_from_file
raise RuntimeError(- f"build from file - failed! Error is - (ret.Tostring()}")
RuntimeError: build from file failed! Error is Common error code.
[TRACEI GE(334540,python):2023-10-13-18:55:55.355.775 [status:INIT] [ge api .cc:3621334540 GEFinalize: GEFinalize start
[TRACEI GE(334540,python):2023-10-13-18:55:55.355.910 [status :RUNNING] Tge_api.cc:3731334540 GEFinalize:Finalizing environment
[TRACE] GE(334540,python):2023-10-13- 18:55:58.021.147 [status:SToP] [ge_api .cc:4011334540 GEFinalize:GEFinalize finished
3 根因分析
通过设置日志等级
export ASCEND_SLOG_PRINT_TO_STDOUT=1
export ASCEND_GLOBAL_LOG_LEVEL=1
确定报错的原因,发现是内存不足。
DewMemAllacHgePageManaged:[LOAD][LOAD][drv ap1] halWemAlloc failed: sLze=1572400(Byte), type=2, modJleId=45, drvFlag=3242591731706905600, drwRetCode=6!
4 解决方案
经过实验发现,对于mindspore2.2之前的版本,通过将环境变量MS_GE_TRAIN去除,或者将MS_GE_TRAIN设置为0,可以成功加载。