1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend310
MindSpore版本: mindspore=2.1.1
执行模式(PyNative/ Graph):不限
Python版本: Python=3.7.10
操作系统平台: Ubuntu.18.04
2 报错信息
2.1脚本信息
import mindspore.context
from mindformers import pipeline
mindspore.context.set_context(device_target="Ascend")
task_pipeline = pipeline(task='text_generation', model='glm_6b', max_length=2048)
task_pipeline('请介绍一下马斯克')
2.2报错信息
[mindspore/ccsrc/kernel/oplib/op_info_utils.cc:172] LoadOpInfoJson] Get op info json suffix path failed, soc_version: Ascend310
[ERROR] KERNEL(2130,7fc13dcf4700,python):2023-10-25-16:52:29.160.750 [mindspore/ccsrc/kernel/oplib/op_info_utils.cc:111] GenerateOpInfos] Load op info json failed, version: Ascend310
Traceback (most recent call last):
File "pipeline.py", line 8, in <module>
task_pipeline('请介绍一下马斯克')
File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/pipeline/base_pipeline.py", line 123, in __call__
outputs = self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/pipeline/base_pipeline.py", line 154, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/pipeline/text_generation_pipeline.py", line 145, in forward
output_ids = self.network.generate(input_ids, **forward_params)
File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/models/text_generator.py", line 466, in generate
streamer=streamer)
File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindformers/models/glm/glm.py", line 505, in _forward
attention_mask=Tensor(attention_mask, mstype.float32)
File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindspore/nn/cell.py", line 662, in __call__
raise err
File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindspore/nn/cell.py", line 659, in __call__
_pynative_executor.end_graph(self, output, *args, **kwargs)
File "/root/anaconda3/envs/chatglm/lib/python3.7/site-packages/mindspore/common/api.py", line 1304, in end_graph
self._executor.end_graph(obj, output, *args, *(kwargs.values()))
RuntimeError: Load op info form json config failed, version: Ascend310
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:431 Init
[ERROR] PIPELINE(2130,7fc4b488b740,python):2023-10-25-16:52:29.181.777 [mindspore/ccsrc/pipeline/jit/pipeline.cc:2311] ClearResAtexit] Check exception before process exit: Load op info form json config failed, version: Ascend310
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:431 Init
3 根因分析
动态学习率是基于train_step,而不是epoch次数
4 解决方案
如果想按 epoch 次数进行动态学习率变化,则需要对 milestone 进行修改。
# steps_per_epoch 每一轮训练所进行的步数
train_dataset = mindspore.dataset.Dataset
steps_per_epoch = train_dataset.get_dataset_size()
# learning_rates = 0.05 if epoch <= 5
# learning_rates = 0.01 if 5 < epoch <= 10
# learning_rates = 0.005 if 10 < epoch <= 15
milestone = [i * steps_per_epoch for i in [5, 10, 15]]
learning_rates = [0.05, 0.01, 0.005]
lr = nn.piecewise_constant_lr(milestone, learning_rates)
optimizer = nn.SGD(network.trainable_params(), learning_rate=lr)