昇腾910上CodeLlama导出mindir模型报错rankTablePath is invalid

1 系统环境

硬件环境(Ascend/GPU/CPU): Ascend910
MindSpore版本: mindspore=2.2.10
执行模式(PyNative/ Graph):不限
Python版本: Python=3.8.15
操作系统平台: linux

2 问题描述

报错rankTablePath is invalid

Ascend Error Message:  
I0003: In [HcomInitByFile],value [/user/config/nbstart_hccl.json] for parameter [rankTablePath]is invalid. Reason: The collec  
as an invalid argument. Reason[/user/config/nbstart_hccl.json]  
	Solution: Try again with a validargument.  
	TraceBack (most recent calllast):  
	PluginManager InvokeAll failed.(FUNC:Initialize] (FILE:ops_kernel_manager.cc][LINE:96]  
	OpsManager initialize failed. [FUNC:InnerInitialize] [FILE:gelib. cc] [LINE:237]  
	GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:165]  
Please search "Ascend Error Message"at https://w.mindspore.cn for errorcode description)  
C++ Call Stack: (For framework developers

3 根因分析

未生成RANK_TABLE_FILE

4 解决方案

运行mindformers/tools/hccl_tools.py生成RANK_TABLE_FILE的json文件

export PATH=/usr/local/Ascend/driver/tools/:${PATH}  
# 运行如下命令,生成当前机器的RANK_TABLE_FILE的json文件  
python ./mindformers/tools/hccl_tools.py --device_num "[0,8)"

配置环境变量 export RANK_TABLE_FILE=xxx.json