1 系统环境
硬件环境(Ascend/GPU/CPU): GPU
MindSpore版本: mindspore=2.0.0
执行模式(动态图): GRAPH_MODE
Python版本: Python=3.7.5
操作系统平台: win10
2 报错信息
2.1 问题描述
在环境配置完毕后,开始执行下面的代码,出现Shape Join Failed。
2.2 报错信息
Traceback (most recent call last):
File "D:/ai.py", line 24, in
out = net(input_x, input_a, input_b)
File "D:\python3.7\lib\site-packages\mindspore\nn\cell.py", line 622, in __call__
out = self.compile_and_run(*args)
File "D:\python3.7\lib\site-packages\mindspore\nn\cell.py", line 1007, in compile_and_run
self.compile(*inputs)
File "D:\python3.7\lib\site-packages\mindspore\nn\cell.py", line 979, in compile
jit_config_dict=self._jit_config_dict)
File "D:\python3.7\lib\site-packages\mindspore\common\api.py", line 1147, in compile
result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
RuntimeError: Cannot join the return values of different branches, perhaps you need to make them equal.
Shape Join Failed: shape1 = (2, 3, 4, 5), shape2 = ().
For more details, please refer to https://www.mindspore.cn/search?inputValue=Shape%20Join%20Failed
Inner Message:
The abstract type of the return value of the current branch is AbstractTensor(shape: (), element: AbstractScalar(Type: Float32, Value: AnyValue, Shape: NoShape), value_ptr: 000001D93E556DB0, value: AnyValue), and that of the previous branch is AbstractTensor(shape: (2, 3, 4, 5), element: AbstractScalar(Type: Float32, Value: AnyValue, Shape: NoShape), value_ptr: 000001D93E556DB0, value: AnyValue).
The node is @Default.6:[CNode]7{[0]: @Default.6:[CNode]8{[0]: ValueNode Switch, [1]: [CNode]13, [2]: ValueNode ?Default.4, [3]: ValueNode ?Default.5}}, true branch: ?Default.4, false branch: ?Default.5
----------------------------------------------------
- The Traceback of Net Construct Code:
----------------------------------------------------
The function call stack (See file 'D:\rank_0\om/analyze_fail.dat' for more details. Get instructions about `analyze_fail.dat` at https://www.mindspore.cn/search?inputValue=analyze_fail.dat):
# 0 In file D:/ai.py:15
if a > b:
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore\ccsrc\pipeline\jit\static_analysis\static_analysis.cc:877 mindspore::abstract::AnalysisEngine::ProcessEvalResults复制
2.3 脚本代码
import numpy as np
import mindspore as ms
import mindspore.ops as ops
from mindspore import nn, Tensor, context
context.set_context(mode=context.GRAPH_MODE)
class Net(nn.Cell):
def __init__(self):
super().__init__()
self.relu = ops.ReLU()
self.reducesum = ops.ReduceSum()
def construct(self, x, a, b):
if a > b:
return self.relu(x)
else:
return self.reducesum(x)
input_x = Tensor(np.random.rand(2, 3, 4, 5).astype(np.float32))
input_a = Tensor(2, ms.float32)
input_b = Tensor(6, ms.float32)
net = Net()
out = net(input_x, input_a, input_b)
3 根因分析
静态图模式下,Python代码并不是由Python解释器去执行,而是将代码编译成静态计算图,然后执行静态计算图。MindSpore支持的控制流语法涉及if语句、for语句以及while语句。
RuntimeError: Cannot join the return values of different branches, perhaps you need to make them equal.
Shape Join Failed: shape1 = (2, 3, 4, 5), shape2 = ().
由报错信息可知,报错原因是if语句不同分支返回值的维度shape不一致:一个是2*3*4*5的四位Tensor,另一个是标量,导致编译报错。
4 解决方案
ReduceSum默认情况下,输出Tensor各维度上的和,以达到对所有维度进行归约的目的。
可以通过指定 keep_dims 参数,来控制输出和输入的维度是否相同。
将代码中的
self.reducesum = ops.ReduceSum()
修改为
self.reducesum = ops.ReduceSum(keep_dims=True)
最终完整代码如下:
import numpy as np
import mindspore as ms
import mindspore.ops as ops
from mindspore import nn, Tensor, context
context.set_context(mode=context.GRAPH_MODE)
class Net(nn.Cell):
def __init__(self):
super().__init__()
self.relu = ops.ReLU()
self.reducesum = ops.ReduceSum(keep_dims=True)
def construct(self, x, a, b):
if a > b:
return self.relu(x)
else:
return self.reducesum(x)
input_x = Tensor(np.random.rand(2, 3, 4, 5).astype(np.float32))
input_a = Tensor(2, ms.float32)
input_b = Tensor(6, ms.float32)
net = Net()
out = net(input_x, input_a, input_b)