1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend
MindSpore版本: mindspore=2.2.0&2.2.10
执行模式(PyNative/ Graph):图模式
Python版本: Python=3.7
操作系统平台: linux
2 报错信息
Traceback (most recent call last):
File "multi_gpu_infer.py", line 346, in <module>
eval_model(args)
File "multi_gpu_infer.py", line 260, in eval_model
warm_up_model.infer_predict_layout(ms.Tensor(np.ones(shape=(1, 3, 336,336)), ms.float16))
File "/root/miniconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 1882, in infer_predict_layout
predict_net.compile(*predict_data)
File "/root/miniconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 998, in compile
jit_config_dict=self._jit_config_dict, *args, **kwargs)
File "/root/miniconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 1547, in compile
result = self._graph_executor.compile(obj, args, kwargs, phase, self._use_vm_mode())
TypeError: The parameters number of the function is 636, but the number of provided arguments is 635.
FunctionGraph : mindformers_models_blip2_blip2_llama_Blip2ImageToTextGeneration_construct_1
NodeInfo: In file /disk1/fail/scripts/mf_parallel0/mindformers/models/blip2/blip2_llama.py:536
def construct(self, image: ms.Tensor, text_input_ids: ms.Tensor):
^
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/pipeline/jit/ps/static_analysis/evaluator.cc:486 Eval
----------------------------------------------------
- The Traceback of Net Construct Code:
----------------------------------------------------
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/blip2/blip2_llama.py:536
def construct(self, image: ms.Tensor, text_input_ids: ms.Tensor):
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/blip2/blip2_llama.py:544
projected_qformer_output = self.forward_qformer_and_proj(image)
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/blip2/blip2_llama.py:484
def forward_qformer_and_proj(self, image: ms.Tensor):
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/blip2/blip2_llama.py:486
image_embeds = self.visual_encoder(image)
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/blip2/blip2.py:112
def construct(self, image):
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/blip2/blip2.py:113
return self.construct_without_pool(image)
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/vit/vit.py:163
def construct_without_pool(self, image, mask=None):
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/vit/vit.py:174
for block in self.blocks:
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/vit/vit_modules.py:342
def construct(self, x, input_mask, rel_pos_bias=None):
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/vit/vit_modules.py:372
mlp_logit = self.output(output_x)
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/vit/vit_modules.py:452
def construct(self, x):
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/vit/vit_modules.py:456
hidden = self.mapping(x)
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/vit/vit_modules.py:198
def construct(self, x):
^
# In file /disk1/fail/scripts/mf_parallel0/mindformers/models/vit/vit_modules.py:201
x = P.Reshape()(x, (-1, self.in_channels))
2.1 问题描述
在跑分布式推理时,调用了infer_predict_layout报参数数目对不上,
TypeError: The parameters number of the function is 636, but the number of provided arguments is 635.
2.2 脚本代码
#调用方式
# shard model and load sharded ckpt
warm_up_model = Model(model)
warm_up_model.infer_predict_layout(ms.Tensor(np.ones(shape=(1, 3, 336,336)), ms.float32))
# 定义
class Blip2ImageToTextGeneration(Blip2Llama):
"""
Blip2ImageToTextGeneration rely on Blip2Llama, used for image to text genearation.
Args:
config (Blip2Config): The config of Blip2ImageToTextGeneration.
Examples:
>>> from mindformers import Blip2ImageToTextGeneration
>>> model = Blip2ImageToTextGeneration.from_pretrained('itt_blip2_stage2_vit_g_llama_7b')
>>> type(model)
<class 'mindformers.models.blip2.blip2_llama.Blip2ImageToTextGeneration'>
"""
_support_list = MindFormerBook.get_model_support_list()['itt']['blip2']
def __init__(self, config: Blip2Config, **kwargs):
super(Blip2ImageToTextGeneration, self).__init__(config, **kwargs)
self.llama_model.set_train(False)
self.one_prefix = ops.Ones()
self.expand_dims = P.ExpandDims()
self.query_length = self.config.qformer_config.query_length
def construct(self, image: ms.Tensor, text_input_ids: ms.Tensor):
if len(text_input_ids.shape) == 1:
text_input_ids = self.expand_dims(text_input_ids, 0)
............
3 根因分析
从报错中我们知道方法需要的参数是636,但是我们提供的是635,少一个。再结合上面的代码片段我们知道,调用的地方给了一个输入,在定义的地方我们定义了两个输入(image和text_input_ids)。所以会这里会报参数数目对不上的问题,
4 解决方案
修改如下
# shard model and load sharded ckpt
warm_up_model = Model(model)
warm_up_model.infer_predict_layout(ms.Tensor(np.ones(shape=(1, 3, 336,336)), ms.float32),ms.Tensor(np.ones(shape=(1, config.seq_length)), ms.int32))
这个报错下面的调用栈其实作用不大,当出现参数数目对不上的报错时,我们应该首先去排查给网络的输入参数是否和定义的一致。