流水线并行报错Reshape op can't be a border.

1 系统环境

硬件环境(Ascend/GPU/CPU): Ascend
MindSpore版本: mindspore=2.2.10
执行模式(PyNative/ Graph):PyNative/ Graph
Python版本: Python=3.8.15
操作系统平台: linux

2 报错信息

    File "/home/ma-user/anaconda3/envs/python-3.9/1ib/python3.9/site packages/mindspore/train/model. py", line 1274, in build  
        self. _init(train_dataset, valid_dataset, sink_size, epoch)  
    File "/home/ma-user/anaconda3/envs/python-3.9/1ib/python3.9/site-packages/mindspore/train/model. py", line 529, in _init  
        train_network. compile (*inputs)  
    File "/home/ma-user/anaconda3/envs/python-3.9/1ib/python3.9/site packages/mindspore/nn/cel1. py", line 997, in compile  
        _cel1_graph_executor. compile (self, phase-self. phase,  
    File "/home/ma-user/anaconda3/envs/python-3.9/1ib/python3.9/site-packages/mindspore/common/api. py", line 1547, in compile  
        result = self._graph_executor. compile(obj, args, kwargs, phase, self._use_vm mode0)  
RuntimeError: Reshape op can't bea border. node:@12339_src_ model_mllm_vi sual_bridge_ VisualBridge_construct_852: output {[0]: ValueNode<PrimitivePy> Reshape, [1]: output, [2]: ValueNode<Valı1eTuple> (1, 32, 10240)}  
-----------------------------------------------------------  
- C++ Call Stack: (For framework developers)  
-----------------------------------------------------------  
mindspore/ccsrc/frontend/parallel/pipeline_ transformer/pipe1ine_transformer. cc:556 Create0pInfo

3 根因分析

Reshape算子不能作为流水线并行的stage的边界,注意要排查调用的算子里有没有使用Reshape算子,比如nn.Dense的最后一步使用了Rehape算子,它放在流水线边界也是不行的。
网络脚本

self.pangu_proj = init_dense(  
    in_channels=encoder_con fig.hidden_size,  
    out_channels hidden_size,  
    activation=None,  
    init_method=init_method_normal('mindspore', sigma 0.02),  
    dtype=ms.float32,  
    compute_type ms.float32,  
    has_bias=True  
)

nn.Dense()源代码:

def construct(self, x):  
    x_shape = self.shape_op(x)  
    check_dense_input_shape(x_shape, self.cls_name)  
    if len(x_shape) != 2:  
        x = self.reshape(x, (-1, x_shape[-1]))  
    x = self.matmul(x, self.weight)  
    if self.has_bias:  
        x = self.bias_add(x, self.bias)  
    if self.activation_flag:  
        x = self.activation(x)  
    if len(x_shape) != 2:  
        out_shape = x_shape[:-1] ] (F.shape(x)[-1],)  
        x = self.reshape(x, out_shape)  
    return x

4 解决方案

  1. 使用copy方法,比如上面的例子可以返回output.copy()。(copy方法里其实是将原数据除以1.0)
  2. 移动边界位置