基于mindspore的权重量化8bit，最终模型大小和float16大小一样

Zhangtian · 2026 年6 月 11 日 01:20

采用下面的固定比特量化的方式得到的量化模型大小和fp16精度一样大，推理效率也一样，没有int8量化的效果，去掉--optimize=ascend_oriented后大小符合预期，但推理报错
转换命令 ./converter_lite --fmk=MINDIR --saveType=MINDIR --modelFile=model.mindir --outputFile=model --optimize=ascend_oriented
量化参数
[common_quant_param]
quant_type=WEIGHT_QUANT
# Weight quantization supports the number of bits [0,16]. Set to 0 is mixed bit quantization, otherwise it is fixed bit quantization
bit_num=8
# Layers with size of weights exceeds threshold `min_quant_weight_size` will be quantized.
min_quant_weight_size=0
# Layers with channel size of weights exceeds threshold `min_quant_weight_channel` will be quantized.
min_quant_weight_channel=16

chengxiaoli · 2026 年6 月 11 日 03:10

用户您好，已经收到上述问题，会尽快分析答复，请耐心等待下~

YeFeng24 · 2026 年6 月 15 日 09:21

当前Ascend后端的量化支持不是特别的完善，当用户配置–optimize=ascend_oriented时候，表示使用的是ascend的推理，所以量化功能不是很完善；

建议通过torch或者onnxruntime进行量化，导出量化的onnx模型，然后通过MindSpore Lite离线转换工具进行转换，然后在ascend的硬件上部署推理；

最近MindSpore Lite代码仓提供了一个有关A8W8的量化skill，可以参考这个pr： AtomGit | GitCode - 全球开发者的开源社区,开源代码托管平台

system · 2026 年6 月 17 日 05:46

此话题已在最后回复的 60 分钟后被自动关闭。不再允许新回复。

话题		回复	浏览量
使用MindSpore Lite端侧模型转换工具将YOLOv8.onnx转为.ms报错Convert failed. Ret: Common error code. 推理经验-Inference Experience	1	124	2025 年10 月 31 日
模型转换失败：not support onnx data type IsNaN MindSpore Lite推理部署	28	474	2025 年10 月 27 日
自定义基于Transformer的Bert模型使用mindspore-lite推理时如何处理变长的batch_size和seq_length两个维度 MindSpore Lite推理部署推理	29	247	2025 年11 月 6 日
模型转化失败，似乎是环境问题，但是使用mindspore模型训练并不受影响 MindSpore Lite推理部署	6	139	2025 年11 月 27 日
TinyClip的image encoder采用mindspore lite 进行模型量化，采用全量化的方案量化后，模型体积缩小了，但是benchmark推理速度大幅降低，主要耗时在matmulfusion这里，请问有没有什么解决办法？ MindSpore Lite推理部署	0	30	2026 年3 月 5 日

基于mindspore的权重量化8bit，最终模型大小和float16大小一样

相关话题