MindSpore报错Please try to reduce 'batch_size' or check whether exists extra large shape.方法二

1 系统环境

硬件环境(Ascend/GPU/CPU): Ascend/GPU/CPU
MindSpore版本: mindspore=2.0.0
执行模式(PyNative/ Graph):不限
Python版本: Python=3.9
操作系统平台: 不限

2 报错信息

进行分布式配置后,运行脚本出现如下报错

RuntimeError: Preprocess failed before run graph 1.
----------------------------------------------------
- Framework Error Message:
----------------------------------------------------
Out of Memory!!! Request memory size: 13518844928B, Memory Statistic:
Device HBM memory size: 32768M
MindSpore Used memory size: 30720M
MindSpore memory base address: 0x124180000000
Total Static Memory size: 19792M
Total Dynamic memory size: 0M
Dynamic memory size of this graph: 0M
Please try to reduce 'batch_size' or check whether exists extra large shape. For more details, please refer to 'Out of Memory' at https://www.mindspore.cn .
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_kernel_executor.cc:244 PreprocessBeforeRunGraph
mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_memory_adapter.cc:169 MallocDynamicDevMem

3 根因分析

没有执行代码,根据报错信息提示的是OOM,申请内存大于实际内存

4 解决方案

减小申请的内存,可在分布式设置之前添加下面代码

from mindspore import context
context.set_context(max_device_memory="25GB")