1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend
MindSpore版本: mindspore=2.2.10
执行模式(PyNative/ Graph):PyNative/ Graph
Python版本: Python=3.8.15
操作系统平台: linux
2 报错信息
File "/home/ma-user/anaconda3/envs/python-3.9/1ib/python3.9/site packages/mindspore/train/model. py", line 1274, in build
self. _init(train_dataset, valid_dataset, sink_size, epoch)
File "/home/ma-user/anaconda3/envs/python-3.9/1ib/python3.9/site-packages/mindspore/train/model. py", line 529, in _init
train_network. compile (*inputs)
File "/home/ma-user/anaconda3/envs/python-3.9/1ib/python3.9/site packages/mindspore/nn/cel1. py", line 997, in compile
_cel1_graph_executor. compile (self, phase-self. phase,
File "/home/ma-user/anaconda3/envs/python-3.9/1ib/python3.9/site-packages/mindspore/common/api. py", line 1547, in compile
result = self._graph_executor. compile(obj, args, kwargs, phase, self._use_vm mode0)
RuntimeError: Reshape op can't bea border. node:@12339_src_ model_mllm_vi sual_bridge_ VisualBridge_construct_852: output {[0]: ValueNode<PrimitivePy> Reshape, [1]: output, [2]: ValueNode<Valı1eTuple> (1, 32, 10240)}
-----------------------------------------------------------
- C++ Call Stack: (For framework developers)
-----------------------------------------------------------
mindspore/ccsrc/frontend/parallel/pipeline_ transformer/pipe1ine_transformer. cc:556 Create0pInfo
3 根因分析
Reshape算子不能作为流水线并行的stage的边界,注意要排查调用的算子里有没有使用Reshape算子,比如nn.Dense的最后一步使用了Rehape算子,它放在流水线边界也是不行的。
网络脚本
self.pangu_proj = init_dense(
in_channels=encoder_con fig.hidden_size,
out_channels hidden_size,
activation=None,
init_method=init_method_normal('mindspore', sigma 0.02),
dtype=ms.float32,
compute_type ms.float32,
has_bias=True
)
nn.Dense()源代码:
def construct(self, x):
x_shape = self.shape_op(x)
check_dense_input_shape(x_shape, self.cls_name)
if len(x_shape) != 2:
x = self.reshape(x, (-1, x_shape[-1]))
x = self.matmul(x, self.weight)
if self.has_bias:
x = self.bias_add(x, self.bias)
if self.activation_flag:
x = self.activation(x)
if len(x_shape) != 2:
out_shape = x_shape[:-1] ] (F.shape(x)[-1],)
x = self.reshape(x, out_shape)
return x
4 解决方案
- 使用copy方法,比如上面的例子可以返回output.copy()。(copy方法里其实是将原数据除以1.0)
- 移动边界位置