MindSpore报错RuntimeError: Load op info form json config failed, version: Ascend310,及解决

1 系统环境

硬件环境(Ascend/GPU/CPU): Ascend310
MindSpore版本: mindspore=2.1.1
执行模式(PyNative/ Graph):不限
Python版本: Python=3.7.10
操作系统平台: Ubuntu.18.04

2 报错信息

2.1脚本信息

import mindspore.context  
from mindformers import pipeline  
    
mindspore.context.set_context(device_target="Ascend")  
task_pipeline = pipeline(task='text_generation', model='glm_6b', max_length=2048)  
    
task_pipeline('请介绍一下马斯克')

2.2报错信息

[mindspore/ccsrc/kernel/oplib/op_info_utils.cc:172] LoadOpInfoJson] Get op info json suffix path failed, soc_version: Ascend310  
[ERROR] KERNEL(2130,7fc13dcf4700,python):2023-10-25-16:52:29.160.750 [mindspore/ccsrc/kernel/oplib/op_info_utils.cc:111] GenerateOpInfos] Load op info json failed, version: Ascend310  
Traceback (most recent call last):  
  File "pipeline.py", line 8, in <module>  
    task_pipeline('请介绍一下马斯克')  
  File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/pipeline/base_pipeline.py", line 123, in __call__  
    outputs = self.run_single(inputs, preprocess_params, forward_params, postprocess_params)  
  File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/pipeline/base_pipeline.py", line 154, in run_single  
    model_outputs = self.forward(model_inputs, **forward_params)  
  File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/pipeline/text_generation_pipeline.py", line 145, in forward  
    output_ids = self.network.generate(input_ids, **forward_params)  
  File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/models/text_generator.py", line 466, in generate  
    streamer=streamer)  
  File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/models/glm/glm.py", line 505, in _forward  
    attention_mask=Tensor(attention_mask, mstype.float32)  
  File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindspore/nn/cell.py", line 662, in __call__  
    raise err  
  File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindspore/nn/cell.py", line 659, in __call__  
    _pynative_executor.end_graph(self, output, *args, **kwargs)  
  File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindspore/common/api.py", line 1304, in end_graph  
    self._executor.end_graph(obj, output, *args, *(kwargs.values()))  
RuntimeError: Load op info form json config failed, version: Ascend310  
  
----------------------------------------------------  
- C++ Call Stack: (For framework developers)  
----------------------------------------------------  
mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:431 Init  
  
[ERROR] PIPELINE(2130,7fc4b488b740,python):2023-10-25-16:52:29.181.777 [mindspore/ccsrc/pipeline/jit/pipeline.cc:2311] ClearResAtexit] Check exception before process exit: Load op info form json config failed, version: Ascend310  
  
----------------------------------------------------  
- C++ Call Stack: (For framework developers)  
----------------------------------------------------  
mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:431 Init

3 根因分析

动态学习率是基于train_step,而不是epoch次数

4 解决方案

如果想按 epoch 次数进行动态学习率变化,则需要对 milestone 进行修改。

# steps_per_epoch 每一轮训练所进行的步数  
train_dataset = mindspore.dataset.Dataset  
steps_per_epoch = train_dataset.get_dataset_size()  
    
# learning_rates = 0.05   if epoch <= 5  
# learning_rates = 0.01  if 5 < epoch <= 10  
# learning_rates = 0.005  if 10 < epoch <= 15  
milestone = [i * steps_per_epoch for i in [5, 10, 15]]  
learning_rates = [0.05, 0.01, 0.005]  
lr = nn.piecewise_constant_lr(milestone, learning_rates)  
optimizer = nn.SGD(network.trainable_params(), learning_rate=lr)