1 报错描述
1.1 系统环境
Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
-- MindSpore version (source or binary): 1.6.0
-- Python version (e.g., Python 3.7.5): 3.7.6
-- OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
-- GCC/Compiler version (if compiled from source):
1.2 基本信息
1.2.1 脚本
训练脚本是通过构建Conv2d的单算子网络,对输入张量计算二维卷积。脚本如下:
01 class Net(nn.Cell):
02 def __init__(self,in_channels,out_channels,kernel_size):
03 super(Net, self).__init__()
04 self.in_channels = in_channels
05 self.out_channels = out_channels
06 self.kernel_size = kernel_size
07 self.conv2d = nn.Conv2d(self.in_channels,self.out_channels,self.kernel_size)
08
09 def construct(self, x):
10 result = self.conv2d(x)
11 return result
12
13 net = Net(in_channels=1, out_channels =240,kernel_size =4)
14 x = Tensor(np.ones([3, 3, 1024, 640]), mindspore.float32)
15 out = net(x)
16 print('out',out.shape)
1.2.2 报错
这里报错信息如下:
[CRITICAL] CORE(117119,ffff837a2010,python):2022-04-07-09:46:50.529.443 [build/mindspore/merge/mindspore/core/ops_merge.cc:6648] Conv2dInferShape] For 'Conv2D', 'C_in' of input 'x' shape divide by parameter 'group' should be equal to 'C_in' of input 'weight' shape: 1, but got 'C_in' of input 'x' shape: 3, and 'group': 1
Traceback (most recent call last):
File "demo.py", line 15, in <module>
out = net(x)
File "../lib/python3.7/site-packages/mindspore/nn/cell.py", line 576, in __call__
out = self.compile_and_run(*args)
File "../lib/python3.7/site-packages/mindspore/nn/cell.py", line 942, in compile_and_run
self.compile(*inputs)
File "../lib/python3.7/site-packages/mindspore/nn/cell.py", line 915, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "../lib/python3.7/site-packages/mindspore/common/api.py", line 791, in compile
result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
RuntimeError: build/mindspore/merge/mindspore/core/ops_merge.cc:6648 Conv2dInferShape] For 'Conv2D', 'C_in' of input 'x' shape divide by parameter 'group' should be equal to 'C_in' of input 'weight' shape: 1, but got 'C_in' of input 'x' shape: 3, and 'group': 1
The function call stack (See file 'demo/rank_0/om/analyze_fail.dat' for more details):
# 0 In file demo.py(10)
result = self.conv2d (x)
^
# 1 In file ../lib/python3.7/site-packages/mindspore/nn/layer/conv.py(286)
if self.has_bias:
# 2 In file ../lib/python3.7/site-packages/mindspore/nn/layer/conv.py(285)
output = self.conv2d(x, self.weight)
2 原因分析
我们着看报错信息,在RuntimeError中,写到*‘C_in’ of input ‘x’ shape divide by parameter ‘group’ should be equal to ‘C_in’ of input ‘weight’ shape: 1, but got ‘C_in’ of input ‘x’ shape: 3, and ‘group’: 1*,意思是输入x shape中C_in 除以 group 必须要等于输入weight shape的C_in,即x_shape[C_in] / group 必须要 == w_shape[C_in] ,但是用户给的w_shape[C_in] 值是1,但是x_shape[C_in] / group 却==3,这个w_shape[C_in]就是权重的channels维的大小,也就是你传的in_channels属性值,检查一下是不是把nn.Conv2d初始化时的in_channels属性设置成1了,在官网中对C_in和in_channels也做了几乎一样的描述。`
检查代码发现,13行代码in_channels确实不等于14行C_in值,此时将in_channels设置为数据相同的C_in值即可。
3 解决方法
基于上面已知的原因,很容易做出如下修改:
此时执行成功,输出如下:
out: (3, 240, 1024, 640)
4 总结
定位报错问题的步骤:
1、 找到报错相关的用户代码行: result = self.conv2d (x) ;
2、 根据日志报错信息中的关键字,缩小分析问题的范围: ‘C_in’ of input ‘x’ shape divide by parameter ‘group’ should be equal to ‘C_in’ of input ‘weight’ shape: 1, but got ‘C_in’ of input ‘x’ shape: 3, and ‘group’: 1 ;
3、需要重点关注变量定义、初始化的正确性。