TinyClip的image encoder采用mindspore lite 进行模型量化,采用全量化的方案量化后,模型体积缩小了,但是benchmark推理速度大幅降低,主要耗时在matmulfusion这里,请问有没有什么解决办法?

.opType avg(ms) percent calledTimes opTotalTime
Activation 1.551100 0.003909 100 15.511001
AddFusion 0.306400 0.000772 210 3.063999
Concat 0.129300 0.000326 10 1.293000
Conv2DFusion 7.485800 0.018865 10 74.858002
DivFusion 0.109000 0.000275 100 1.090000
Gather 0.004400 0.000011 10 0.044000
LayerNormFusion 0.203700 0.000513 220 2.037000
MatMulFusion 369.641052 0.931557 610 3696.410645
QuantDTypeCast 14.433500 0.036375 1690 144.335007
Reshape 0.275301 0.000694 410 2.753009
Softmax 0.985100 0.002483 100 9.851000
StridedSlice 0.326700 0.000823 300 3.266996
Transpose 1.346900 0.003394 430 13.468996
Model = .\out\tinyclip_8M_image_int8_1.ms, NumThreads = 8, MinRunTime = 390.117004 ms, MaxRuntime = 421.143005 ms, AvgRunTime = 398.230011 ms