MindSpore报错ValueError:x.shape和y.shape不能广播,得到i:-2,x.shapes:[2,5],y.shape:[3,5]

1 报错描述

1.1 系统环境

Hardware Environment(Ascend/GPU/CPU): Ascend Software Environment: -- MindSpore version (source or binary): 1.6.0 -- Python version (e.g., Python 3.7.5): 3.7.6 -- OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic -- GCC/Compiler version (if compiled from source):

1.2 基本信息

1.2.1 脚本

训练脚本是通过构建Abs的单算子网络,对输入两个张量做Sub运算后再计算Abs。脚本如下:

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.abs = ops.Abs()
    
    def construct(self, x1,x2):
        output = self.abs(x1 - x2)
        return output

net = Net()
x1 = Tensor(np.ones((2, 5), dtype=np.float32), mindspore.float32)
x2 = Tensor(np.ones((3, 5), dtype=np.float32), mindspore.float32)
out = net(x1,x2)
print('out',out.shape)

2 报错

这里报错信息如下:

The function call stack (See file '/demo/rank_0/om/analyze_fail.dat' for more details):
# 0 In file demo.py(7)
         output = self.abs(x1 - x2)
                           ^

Traceback (most recent call last):
  File "demo.py", line 13, in <module>
    out = net(x1,x2)
  File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 576, in __call__
    out = self.compile_and_run(*args)
  File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 942, in compile_and_run
    self.compile(*inputs)
  File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 915, in compile
    _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
  File "/lib/python3.7/site-packages/mindspore/common/api.py", line 791, in compile
    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
  File "/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 575, in __infer__
    out[track] = fn(*(x[track] for x in args))
  File "/lib/python3.7/site-packages/mindspore/ops/operations/math_ops.py", line 78, in infer_shape
    return get_broadcast_shape(x_shape, y_shape, self.name)
  File "/lib/python3.7/site-packages/mindspore/ops/_utils/utils.py", line 70, in get_broadcast_shape
    raise ValueError(f"For '{prim_name}', {arg_name1}.shape and {arg_name2}.shape are supposed "
ValueError: For 'Sub', x.shape and y.shape are supposed to broadcast, where broadcast means that x.shape[i] = 1 or -1 or y.shape[i] = 1 or -1 or x.shape[i] = y.shape[i], but now x.shape and y.shape can not broadcast, got i: -2, x.shape: [2, 5], y.shape: [3, 5].

原因分析

我们看报错信息,在ValueError中,写到ValueError: For ‘Sub’, x.shape and y.shape are supposed to broadcast, where broadcast means that x.shape = 1 or -1 or y.shape = 1 or -1 or x.shape = y.shape ,意思是abs的两个操作对象不能进行broadcast,broadcast的要求是x.shape = 1 or -1 or y.shape = 1 or -1 or x.shape = y.shape ,而x.shape = y.shape 要求两个参数的shape完全相等,在进一步的报错信息中也有写到but now x.shape and y.shape can not broadcast, got i: -2, x.shape: [2, 5], y.shape: [3, 5],显然,x和y的第一个维度不等,这就是问题出现的原因了。关于BroadCast,在官网做了输入限制,对输入的Tensor要求shape必须相同。在其他的双输入算子中,有一定量算子用到了BroadCast操作,也应当注意这点。

3 解决方法

基于上面已知的原因,很容易做出如下修改: 示例1:


此时执行成功,输出如下:
out: (3, 5)
示例2:

class Net(nn.Cell):
    def __init__(self):
    ​    super(Net, self).__init__()
    ​    self.abs = ops.Abs()
    
    def construct(self, x1,x2):
    ​    output = self.abs(x1 - x2)
    ​    return output

net = Net()
x1 = Tensor(np.ones((5), dtype=np.float32), mindspore.float32)
x2 = Tensor(np.ones((3, 5), dtype=np.float32), mindspore.float32)
out = net(x1,x2)
print('out',out.shape)

此时执行成功,输出如下:
out: (3, 5)

4 总结

定位报错问题的步骤:
1、找到报错的用户代码行: output = self.abs(x1 - x2) ;
2、 根据日志报错信息中的关键字,缩小分析问题的范围: x.shape: [2, 5], y.shape: [3, 5] ;
3、需要重点关注变量定义、初始化的正确性。

5 参考文档

5.1 broadcast方法