MindSpore报错ERROR:PyNative Only support STAND_ALONE,DATA_PARALLEL and AUTO_PARALLEL under shard function for ParallelMode

PyNative模式下配置为SEMI_AUTO_PARALLEL模式,遇到报错PyNative Only support STAND_ALONE, DATA_PARALLEL and AUTO_PARALLEL under shard function for ParallelMode, but got SEMI_AUTO_PARALLEL。

1. 系统环境

Hardware Environment(Ascend/GPU/CPU): Ascend  
Software Environment:  
MindSpore version (source or binary): 1.6.1  
Python version (e.g., Python 3.7.5): 3.7.6  
OS platform and distribution (e.g., Linux Ubuntu 16.04):  
GCC/Compiler version (if compiled from source):

2. 脚本

启动脚本如下,

ulimit -u unlimited  
ulimit -SHn 65535  
export DEVICE_NUM=$1  
export RANK_SIZE=$1  
RANK_TABLE_FILE=$(realpath $2)  
export RANK_TABLE_FILE  
echo "RANK_TABLE_FILE=${RANK_TABLE_FILE}"  
    
export SERVER_ID=0  
rank_start=$((DEVICE_NUM * SERVER_ID))  
for((i=0; i<$1; i++));  
do  
export DEVICE_ID=$i  
export RANK_ID=$((rank_start + i))  
rm -rf ./train_parallel$i  
mkdir ./train_parallel$i  
echo "start training for rank $RANK_ID, device $DEVICE_ID"  
cd ./train_parallel$i ||exit  
env > env.log  
python ../test.py > log 2>&1 &  
cd ..  
done

test.py文件内容如下

import mindspore.dataset as ds  
import mindspore.communication.management as D  
from mindspore.train.callback import LossMonitor  
from mindspore.train.callback import ModelCheckpoint  
from mindspore.common.initializer import initializer  
    
step_per_epoch = 4  
    
def get_dataset(*inputs):  
    def generate():  
        for _ in range(step_per_epoch):  
            yield inputs  
    return generate  
    
class Net(Cell):  
    """define net"""  
    def __init__(self):  
    super().__init__()  
    self.matmul = P.MatMul().shard(((2, 4), (4, 1)))  
    self.weight = Parameter(initializer("normal", [32, 16]), "w1")  
    self.relu = P.ReLU().shard(((8, 1),))  
    
def construct(self, x):  
    out = self.matmul(x, self.weight)  
    out = self.relu(out)  
    return out  
    
if __name__ == "__main__":  
    context.set_context(mode=context.PYNATIVE_MODE, device_target="Ascend", save_graphs=True)  
    D.init()  
    rank = D.get_rank()   
context.set_auto_parallel_context(parallel_mode="semi_auto_parallel", device_num=8, full_batch=True)  
    
    np.random.seed(1)  
    input_data = np.random.rand(16, 32).astype(np.float32)  
    label_data = np.random.rand(16, 16).astype(np.float32)  
    fake_dataset = get_dataset(input_data, label_data)  
    net = Net()  
    
    callback = [LossMonitor(), ModelCheckpoint(directory="{}".format(rank))]  
    dataset = ds.GeneratorDataset(fake_dataset, ["input", "label"])  
    loss = SoftmaxCrossEntropyWithLogits()  
    
    learning_rate = 0.001  
    momentum = 0.1  
    epoch_size = 1  
    opt = Momentum(net.trainable_params(), learning_rate, momentum)  
    
    model = Model(net, loss_fn=loss, optimizer=opt)  
    model.train(epoch_size, dataset, callbacks=callback, dataset_sink_mode=False)

3. 报错信息

4. 报错分析

在PyNative模式下面,指定context.set_auto_parallel_context(parallel_mode=“semi_auto_parallel”)会遇到报错


原因是现在自动并行是图编译中的一个对于图的优化流程,而PyNative模式下面并不会走相同的优化,无法进入到此流程,所以在前端进行了校验,即PyNative模式不支持semi_auto_parallel。
如果想在PyNative模式下面使用自动并行的能力,可以参考https://mindspore.cn/docs/programming_guide/zh-CN/master/pynative_shard_function_parallel.html ,在auto_parallel模式下使用shard函数,此函数可以指定某一部分以图模式执行,并且其内部可以进行算子级别的模型并行。