1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend/GPU/CPU
MindSpore版本: mindspore=2.2.0
执行模式(PyNative/ Graph):Graph
Python版本: Python=3.7
操作系统平台: 不限
2 报错信息
2.1脚本代码
import mindspore
from mindformers import AutoConfig, AutoModel, AutoTokenizer
# 指定图模式,指定使用训练卡id
mindspore.set_context(mode=0, device_id=0)
tokenizer = AutoTokenizer.from_pretrained('/home/ma-user/work/model_save/llama2')
# model的实例化有以下两种方式,选择其中一种进行实例化即可
# 1. 直接根据默认配置实例化
model = AutoModel.from_pretrained('/home/ma-user/work/model_save/llama2')
# 2. 自定义修改配置后实例化
config = AutoConfig.from_pretrained('/home/ma-user/work/model_save/llama2/llama2_7b.yaml')
config.use_past = True # 此处修改默认配置,开启增量推理能够加速推理性能
# config.xxx = xxx # 根据需求自定义修改其余模型配置
model = AutoModel.from_config(config) # 从自定义配置项中实例化模型
inputs = tokenizer("I love Beijing, because")["input_ids"]
# 首次调用model.generate()进行推理将包含图编译时间,推理性能显示不准确,多次重复调用以获取准确的推理性能
outputs = model.generate(inputs, max_new_tokens=30, do_sample=False)
response = tokenizer.decode(outputs)
print(response)
['<s>I love Beijing, because it’s a city that is constantly changing. I have been living here for 10 years and I have seen the city change so much.I']
2.2 报错信息
2023-12-14 18:03:45,101 - mindformers[mindformers/auto_class.py:704] - INFO - Config in the yaml file /home/ma-user/work/model_save/llama2/llama2_7b.yaml are used for tokenizer building.
2023-12-14 18:03:45,141 - mindformers[mindformers/auto_class.py:711] - INFO - Load the tokenizer name LlamaTokenizer from the /home/ma-user/work/model_save/llama2
2023-12-14 18:03:45,142 - mindformers[mindformers/models/base_tokenizer.py:1979] - INFO - config in the yaml file /home/ma-user/work/model_save/llama2/llama2_7b.yaml are used for tokenizer building.
2023-12-14 18:03:45,332 - mindformers[mindformers/models/base_tokenizer.py:1988] - WARNING - Can't find the tokenizer_config.json in the file_dict. The content of file_dict is : {}
2023-12-14 18:03:45,333 - mindformers[mindformers/models/base_tokenizer.py:1995] - INFO - build tokenizer class name is: LlamaTokenizer using args {'unk_token': '<unk>', 'bos_token': '<s>', 'eos_token': '</s>', 'pad_token': '<unk>', 'vocab_file': '/home/ma-user/work/model_save/llama2/tokenizer.model'}.
2023-12-14 18:03:45,353 - mindformers[mindformers/auto_class.py:790] - INFO - LlamaTokenizer Tokenizer built successfully!
2023-12-14 18:03:45,392 - mindformers[mindformers/auto_class.py:368] - INFO - model config: /home/ma-user/work/model_save/llama2/llama2_7b.yaml and checkpoint_name_or_path: /home/ma-user/work/model_save/llama2/llama2_7b.ckpt are used for model building.
2023-12-14 18:03:45,393 - mindformers[mindformers/version_control.py:33] - INFO - The Cell Reuse compilation acceleration feature is not supported when the environment variable ENABLE_CELL_REUSE is 0 or MindSpore version is earlier than 2.1.0 or stand_alone mode or pipeline_stages <= 1
2023-12-14 18:03:45,393 - mindformers[mindformers/version_control.py:37] - INFO -
The current ENABLE_CELL_REUSE=0, please set the environment variable as follows:
export ENABLE_CELL_REUSE=1 to enable the Cell Reuse compilation acceleration feature.
2023-12-14 18:03:45,393 - mindformers[mindformers/version_control.py:43] - INFO - The Cell Reuse compilation acceleration feature does not support single-card mode.This feature is disabled by default. ENABLE_CELL_REUSE=1 does not take effect.
2023-12-14 18:03:45,393 - mindformers[mindformers/version_control.py:46] - INFO - The Cell Reuse compilation acceleration feature only works in pipeline parallel mode(pipeline_stage>1).Current pipeline stage=1, the feature is disabled by default.
2023-12-14 18:05:42,907 - mindformers[mindformers/models/base_model.py:115] - INFO - weights in /home/ma-user/work/model_save/llama2/llama2_7b.ckpt are loaded
2023-12-14 18:05:42,911 - mindformers[mindformers/auto_class.py:453] - INFO - model built successfully!
2023-12-14 18:05:44,344 - mindformers[mindformers/auto_class.py:125] - INFO - the content in /home/ma-user/work/model_save/llama2/llama2_7b.yaml is used for config building.
2023-12-14 18:05:44,345 - mindformers[mindformers/version_control.py:33] - INFO - The Cell Reuse compilation acceleration feature is not supported when the environment variable ENABLE_CELL_REUSE is 0 or MindSpore version is earlier than 2.1.0 or stand_alone mode or pipeline_stages <= 1
2023-12-14 18:05:44,345 - mindformers[mindformers/version_control.py:37] - INFO -
The current ENABLE_CELL_REUSE=0, please set the environment variable as follows:
export ENABLE_CELL_REUSE=1 to enable the Cell Reuse compilation acceleration feature.
2023-12-14 18:05:44,345 - mindformers[mindformers/version_control.py:43] - INFO - The Cell Reuse compilation acceleration feature does not support single-card mode.This feature is disabled by default. ENABLE_CELL_REUSE=1 does not take effect.
2023-12-14 18:05:44,345 - mindformers[mindformers/version_control.py:46] - INFO - The Cell Reuse compilation acceleration feature only works in pipeline parallel mode(pipeline_stage>1).Current pipeline stage=1, the feature is disabled by default.
2023-12-14 18:06:54,088 - mindformers[mindformers/tools/download_tools.py:71] - ERROR - Connect error, please download https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/llama2_7b.ckpt to ./checkpoint_download/llama2/llama2_7b.ckpt.
2023-12-14 18:06:54,089 - mindformers[mindformers/models/base_model.py:106] - INFO - checkpoint download failed, and pretrained weights are unloaded.
2023-12-14 18:06:54,089 - mindformers[mindformers/auto_class.py:292] - INFO - model built successfully!
2023-12-14 18:06:54,138 - mindformers[mindformers/generation/text_generator.py:1097] - INFO - Generation Config is: {'max_length': 512, 'max_new_tokens': 30, 'num_beams': 1, 'do_sample': False, 'use_past': True, 'temperature': 1.0, 'top_k': 0, 'top_p': 1.0, 'repetition_penalty': 1, 'encoder_repetition_penalty': 1.0, 'renormalize_logits': False, 'pad_token_id': 0, 'bos_token_id': 1, 'eos_token_id': 2, '_from_model_config': True}
2023-12-14 18:06:54,138 - mindformers[mindformers/generation/text_generator.py:176] - INFO - The generation mode will be **GREEDY_SEARCH**.
[WARNING] UTILS(2539493,ffff98fd9b70,python):2023-12-14-18:07:03.679.634 [mindspore/ccsrc/utils/comm_manager.cc:80] GetInstance] CommManager instance for CPU not found, return default instance.
Traceback (most recent call last):
File "/home/ma-user/work/mindformers/for_test.py", line 20, in <module>
outputs = model.generate(inputs, max_new_tokens=30, do_sample=False)
File "/home/ma-user/work/mindformers/mindformers/generation/text_generator.py", line 1114, in generate
output_ids = self._greedy_search(
File "/home/ma-user/work/mindformers/mindformers/generation/text_generator.py", line 394, in _greedy_search
res = self._incremental_infer(
File "/home/ma-user/work/mindformers/mindformers/generation/text_generator.py", line 225, in _incremental_infer
res = self(
File "/home/ma-user/.conda/envs/mindformer/lib/python3.9/site-packages/mindspore/nn/cell.py", line 680, in __call__
out = self.compile_and_run(*args, **kwargs)
File "/home/ma-user/.conda/envs/mindformer/lib/python3.9/site-packages/mindspore/nn/cell.py", line 1020, in compile_and_run
self.compile(*args, **kwargs)
File "/home/ma-user/.conda/envs/mindformer/lib/python3.9/site-packages/mindspore/nn/cell.py", line 997, in compile
_cell_graph_executor.compile(self, phase=self.phase,
File "/home/ma-user/.conda/envs/mindformer/lib/python3.9/site-packages/mindspore/common/api.py", line 1547, in compile
result = self._graph_executor.compile(obj, args, kwargs, phase, self._use_vm_mode())
ValueError: For BatchMatMul, inputs shape cannot be broadcast on CPU/GPU, with x shape [const vector]{1, 32, 4096, 128}, y shape [const vector]{128, 128}
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/core/ops/batch_matmul.cc:119 CheckBatchMatmulInputWhetherCanBeBroadcast
----------------------------------------------------
- The Traceback of Net Construct Code:
----------------------------------------------------
# 0 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:379
if self.use_past:
# 1 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:380
if not isinstance(init_reset, Tensor):
# 2 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:384
if self.training:
# 3 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:387
tokens = input_ids
^
# 4 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:402
if not self.training:
# 5 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:407
if (not self.use_past or self.is_first_iteration) and input_position is not None:
^
# 6 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:389
output = self.model(tokens, input_position, init_reset, batch_valid_length)
^
# 7 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:229
if self.is_first_iteration:
# 8 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:230
freqs_cis = (self.tile(self.reshape(self.freqs_cos, (1, 1, seq_len, -1)), (bs, 1, 1, 1)),
^
# 9 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:245
if not self.use_flash_attention:
# 10 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:246
mask = self.expand_dims(mask, 1)
# 11 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:389
output = self.model(tokens, input_position, init_reset, batch_valid_length)
^
# 12 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:252
for i in range(self.num_layers):
# 13 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama.py:253
h, _ = self.layers[i](h, freqs_cis, mask,
# 14 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama_transformer.py:532
if self.compute_in_2d and x.ndim != 2:
# 15 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama_transformer.py:539
if self.use_past and self.is_first_iteration:
# 16 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama_transformer.py:541
self.assign_past(self.key_past, self.mul_past(self.key_past, self.cast(init_reset, self.dtype)))
^
# 17 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama_transformer.py:558
if self.use_past:
# 18 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama_transformer.py:549
h, layer_present = self.attention(input_x, freqs_cis, mask,
# 19 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama_transformer.py:253
query, key = self.apply_rotary_emb(query, key, freqs_cis) # dp, mp, 1, 1
^
# 20 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama_layer.py:151
self.mul(self.rotate_half(xq, swap_mask), freqs_sin))
^
# 21 In file /home/ma-user/work/mindformers/mindformers/models/llama/llama_layer.py:140
x = self.bmm_swap(x, swap_mask)
^
(See file '/home/ma-user/work/mindformers/rank_0/om/analyze_fail.ir' for more details. Get instructions about `analyze_fail.ir` at https://www.mindspore.cn/search?inputValue=analyze_fail.ir)
3 根因分析
对于BatchMatMul,输入形状不能在CPU/GPU上广播。
4 解决方案
MindFormers不支持CPU/GPU的硬件。需要更换运行环境,在Ascend910或310上计算硬件上可以推理成功。