1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend910
MindSpore版本: mindspore=2.2.10
执行模式(PyNative/ Graph):不限
Python版本: Python=3.8.15
操作系统平台: linux
2 问题描述
报错rankTablePath is invalid
Ascend Error Message:
I0003: In [HcomInitByFile],value [/user/config/nbstart_hccl.json] for parameter [rankTablePath]is invalid. Reason: The collec
as an invalid argument. Reason[/user/config/nbstart_hccl.json]
Solution: Try again with a validargument.
TraceBack (most recent calllast):
PluginManager InvokeAll failed.(FUNC:Initialize] (FILE:ops_kernel_manager.cc][LINE:96]
OpsManager initialize failed. [FUNC:InnerInitialize] [FILE:gelib. cc] [LINE:237]
GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:165]
Please search "Ascend Error Message"at https://w.mindspore.cn for errorcode description)
C++ Call Stack: (For framework developers
3 根因分析
未生成RANK_TABLE_FILE
4 解决方案
运行mindformers/tools/hccl_tools.py生成RANK_TABLE_FILE的json文件
export PATH=/usr/local/Ascend/driver/tools/:${PATH}
# 运行如下命令,生成当前机器的RANK_TABLE_FILE的json文件
python ./mindformers/tools/hccl_tools.py --device_num "[0,8)"
配置环境变量 export RANK_TABLE_FILE=xxx.json