MindSpore Lite输入输出数据免拷贝

lujiale · 2026 年1 月 9 日 14:42

原理

模型执行过程中，输入数据需要先从host侧（内存）拷贝到device侧（显存）。模型推理结束后，再将推理结束后，再将推理结果从device侧拷贝到host侧。但输入输出数据量很大时，这部分的拷贝耗时很大，无法忽略。

事实上，当多个模型连续执行时，模型的输出可以不需要拷贝到host，而是保留在device侧作为下一个模型的输入，这样可以节约拷贝的时间，从而提升pipeline性能。

实现基础：model.Predict接口支持传入input和output，当output指定为device侧tensor时，模型输出会直接保存在device侧，并且支持output中的元素可以一部分为host侧，一部分为device侧的tensor，方便现实场景灵活使用。

适用场景：多个model在pipeline中顺序执行，model1的输出是model2的输入，且两个模型中间没有额外的计算。

Python接口

import mindspore_lite as mslite
output_shapes = [[1, 1], [2, 2]]

output_tensors  = [mslite.Tensor(shape=shape, device='“ascend:0”, dtype = mslite.DataType.FLOAT32) for shape in output_shapes]

model1.predict(model1_inputs, output_tensors)
models.predict(output_tensors, model2_outputs)

renturn model2_outputs

Python免拷贝使用注意事项

申请host侧的tensor和申请device的tensor区别在于：1）是否设置device_id，2）申请host侧的tensor一定先要传入数据。
两个模型之间传递device的tensor，不要对该tensor做进一步处理（如：get_data_to_numpy等）。如果业务场景有对中间过程的device的tensor做进一步处理，则建议该tensor使用host侧的tensor。

单模型样例使能输入输出免拷贝

使用mslite.Tensor接口创建Tensor并指定device，可以直接将Tensor放在device侧。
注意：使用model.get_inputs()获得的list(Tensor)，对其中的Tensor调用set_data_from_numpy无法将其放在device侧。

output_shapes = [[1, 1, 1, 1], [2, 2, 2, 2]]
input_shapes = [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4], [5]]

output_tensors = [mslite.Tensor(shape=shape, device=”ascend:3”, dtype = mslite.DataType.FLOAT32) for shape in output_shapes]
input_tensors = [mslite.Tensor(shape=shape, device=”ascend:3”, dtype = mslite.DataType.FLOAT32) for shape in input_shapes]

warmup_time = 3
for i in range(warmup_time):
    model.predict(input_tensors, output_tensors)
begin = time.time()
model.predict(input_tensor, output_tensors)
print("model_predict time :", time.time() - begin)

C++接口

创建device侧tensor接口原型：

static inline MSTensor *CreateTensor(const std::string &name, DataType type, const std:vector<int64_t> &shape, const void *data, size_t data_len, const std::string &device = "", int device_id = -1) noexcept;

免拷贝示例代码：

std::string model_path = "your model path";

// create adn init context, add CPU device info
auto context = std::make_shared<mindspore::Context>();
auto &device_list = context->MutableDeviceInfo();
// note: CPUDeviceInfo -> AscendDeviceInfo
auto device_info = std::make_shared<mindspore::AscendDeviceInfo>();
device_list.push_back(device_info);

std::vector<MSTensor> outputs;
// note: create a device tensor and add to 'output'
auto tensor = mindspore::MSTensor::CreateTensor(tensor_name, mindspore::DataType::kNumberTypeFloat32, {1, 3, 256, 256}, cpu_data_ptr, 1*3*256*256)
outputs.push_back(*tensor);

auto model = std::make_shared<Model>();
model->Build(model_path, mindspore::KMindIR, context);
auto inputs = model.GetInputs();
model->predict(inputs, &outputs);

注意：需要将device侧的output转移到host侧时，不能直接调用MutableData接口， MutableData接口不会进行device to host拷贝，按照如下代码操作：

// 假设output中tensor都为host侧tensor，且需要将第一个值从device侧转移到host侧，CreateTensor接口中会进行device to host拷贝
auto tensor = mindspore::MSTensor::CreateTensor("output", output[0])
// host侧数据会保存在CreateTensor函数的出参中，调用MutableData即可获取其数据指针
auto data = tensor->MutableData();

话题		回复	浏览量
mslite中创建两个model对象能否实现权重内存或者显存复用 MindSpore Lite推理部署	5	106	2026 年1 月 9 日
如何获取模型运行内存大小 MindSpore Lite推理部署	1	35	2026 年3 月 5 日
使用MindSpore Lite推理，报错RuntimeError: data size not equal! Numpy size: 6144000, Tensor size: 0 推理经验-Inference Experience	0	24	2025 年8 月 7 日
[教程]Arm+飞腾DSP异构环境下部署Mindspore-Lite应用(一)----python端生成ms模型 MindSpore Lite推理部署部署	3	79	2025 年10 月 27 日
[教程] MindSpore Lite显存共享 MindSpore Lite推理部署	0	42	2026 年1 月 9 日

MindSpore Lite输入输出数据免拷贝

原理

Python接口

Python免拷贝使用注意事项

C++接口

相关话题