CSRTensor 矩阵乘法计算出错RuntimeError:CUDA Error: cudaMemcpy failed.|Error Number: 700 an illegal memory access was encountered

1 系统环境

硬件环境(Ascend/GPU/CPU): GPU
MindSpore版本: 2.0.0-alpha
执行模式(PyNative/ Graph): PyNative
Python版本: 3.8
操作系统平台:Linux

2 报错信息

2.1 问题描述

CSRTensor 矩阵乘法在GPU下计算报错

2.2 报错信息

RuntimeError:CUDA Error: cudaMemcpy failed. | Error Number: 700 an illegal memory access was encountered复制

2.3 脚本代码

import mindspore as ms

a = ms.numpy.ones((7600, 7600), ms.float32).to_csr()
b = ms.numpy.ones((7600, 7600), ms.float32).to_csr()
a.mm(b)

3 根因分析

Traceback (most recent call last):  
  File "<stdin>", line 1, in <module>  
  File "/python3.7/site-packages/mindspore/common/sparse_tensor.py", line 825, in mm  
    return tensor_operator_registry.get("csr_mm")(self, matrix)  
  File "/python3.7/site-packages/mindspore/ops/function/sparse_func.py", line 299, in csr_mm  
    b.values)  
  File "/python3.7/site-packages/mindspore/ops/primitive.py", line 316, in __call__  
    return _run_op(self, self.name, args)  
  File "/python3.7/site-packages/mindspore/common/api.py", line 97, in wrapper  
    results = fn(*arg, **kwargs)  
  File "/python3.7/site-packages/mindspore/ops/primitive.py", line 808, in _run_op  
    output = _pynative_executor.real_run_op(obj, op_name, args)  
  File "/python3.7/site-packages/mindspore/common/api.py", line 977, in real_run_op  
    return self._executor.real_run_op(*args)  
RuntimeError: CUDA Error: SparseMatrixSparseMatMul cudaStreamSynchronized failed | Error Number: 700 an illegal memory access was encountered  
  
----------------------------------------------------  
- C++ Call Stack: (For framework developers)  
----------------------------------------------------  
mindspore/ccsrc/plugin/device/gpu/kernel/sparse/sparse_matrix_sparse_matmul_gpu_kernel.cc:439 SyncData

a.mm(b)最终调用的是ops.csr_mm
所以问题等价于

import mindspore as ms  
from mindspore import Tensor, CSRTensor  
from mindspore import dtype as mstype  
import mindspore.ops as ops

a = ms.numpy.ones((7600, 7600), ms.float32).to_csr()  
b = ms.numpy.ones((7600, 7600), ms.float32).to_csr()  
c = ops.csr_mm(a, b)

上面代码同样会复现这个问题。
报错代码

void SparseMatrixSparseMatMulGpuKernelMod::SyncData() {
CHECK_CUDA_RET_WITH_EXCEPT_NOTRACE(cudaStreamSynchronize(stream),
"SparseMatrixSparseMatMul cudaStreamSynchronized failed");

mindspore/ccsrc/plugin/device/gpu/kernel/sparse/sparse_matrix_sparse_matmul_gpu_kernel.cc:439
如下:
cudaStreamSynchronize报错,应该是把数据同步到卡的时候出错了。且出错后接下来的其他操作也影响了。
mindspore/core/ops/sparse_matrix_sparse_mat_mul.cc
稀疏矩阵相乘调用的是如下Infer

AbstractBasePtr SparseMatrixSparseMatMulInfer(const abstract::AnalysisEnginePtr &, const PrimitivePtr &primitive,  
                                              const std::vector<AbstractBasePtr> &input_args) {  
  MS_EXCEPTION_IF_NULL(primitive);  
  for (const auto &item : input_args) {  
    MS_EXCEPTION_IF_NULL(item);  
  }  
  auto prim_name = primitive->name();  
  const int64_t input_num = 10;  
  (void)CheckAndConvertUtils::CheckInteger("input number", SizeToLong(input_args.size()), kEqual, input_num, prim_name);  
  auto infer_type = SparseMatrixSparseMatMulInferType(primitive, input_args);  
  auto infer_shape = SparseMatrixSparseMatMulInferShape(primitive, input_args);  
  return abstract::MakeAbstract(infer_shape, infer_type);  
}

再看SparseMatrixSparseMatMulInferShape这个函数

 abstract::ShapePtr y_row_shape = nullptr;  
  abstract::ShapePtr y_col_shape = nullptr;  
  abstract::ShapePtr y_values_shape = nullptr;  
  
  ShapeVector col_shape = {abstract::Shape::kShapeDimAny};  
  ShapeVector values_shape = {abstract::Shape::kShapeDimAny};  
  ShapeVector infer_shape_min = {1};  
  ShapeVector infer_shape_max = {MAX_LENGTH};  
  y_col_shape = std::make_shared<abstract::Shape>(col_shape, infer_shape_min, infer_shape_max);  
  y_values_shape = std::make_shared<abstract::Shape>(values_shape, infer_shape_min, infer_shape_max);
ShapeVector infer_shape_max = {MAX_LENGTH};

然后有相关定义:

const int MAX_LENGTH = 100000;

这边的shape创建的智能指针,当矩阵shape过大的时候,就会超出范围,当进行cuda运算的时候就会出错了。

4 解决方案

既然是数据量大的时候同步数据报错,那就减小数据量就可以了

import mindspore as ms  
    
a = ms.numpy.ones((760, 760), ms.float32).to_csr()  
b = ms.numpy.ones((760, 760), ms.float32).to_csr()  
a.mm(b)

或者是b 设置为非稀疏矩阵,因为如果都设置为1的情况下,稀疏矩阵等同于稠密矩阵。

import mindspore as ms  
    
a = ms.numpy.ones((7600, 7600), ms.float32).to_csr()  
b = ms.numpy.ones((7600, 7600), ms.float32)  
a.mm(b)