Mindspore在自回归推理时的精度对齐设置

Skyti · 2025 年9 月 27 日 12:45

1 系统环境

硬件环境(Ascend/GPU/CPU): Ascend/GPU/CPU
MindSpore版本: 2.1
执行模式（PyNative/ Graph）: 不限

2 报错信息

2.1 问题描述

自回归推理时，不会使用KV cache，每次生成的token会拼接在当前输入的最后位置，作为下一次的输入。在mindspore中通过配置use_past设为False来开启自回归推理，在huggingface的transformer中模型配置文件config.json中配置项use_cache表示是否开启KV cache，但是直接设置use_cache为false并不会生效。

{  
    _name_or_path": "bigcode/starcoder"  
    "activation_function": "gelu"  
    "architectures": [  
    "GPTBigCodeForCausalLM'  
    "attention softma x in fp32": true,  
    "attn pdrop": 0.1,  
    "bos token id": 0,  
    "embd pdrop": 0.1,  
    "eos_token_id": 0,  
    "inference_runner": 0,  
    "initializer_range": 0.02,  
    "layer_nom_epsilon": 1e-05,  
    "max_batch_size": null,  
    "max_sequence_length": null,  
    "model_type": "gpt_bigcode",  
    "multi_query": true,  
    "n_embd": 6144,  
    "n_head": 48,  
    "n_inner": 24576,  
    "n_layer": 40,  
    "n_positions": 8192  
    "pad_key_length": true,  
    "pre_allocate_kv_cache": false,  
    "resid_pdrop": 0.1,  
    "scale_attention_so ftmax_in_fp32": false,  
    "scale_attn weights": true,  
    "summary_ activation": null,  
    "summary first  dropout": 0.1,  
    "summary_proj_to labels": true,  
    "summary_type": "cls index",  
    "summary_use_ proj": true,  
    "torch dtype": "float16",  
    "transfommers_version": "4.30.0.dev0",  
    "use cache": true,  
    vdLiudLe rumier_.Lnput": true,  
    "vocab size": 49153  
}

3 解决方案

需要在model.generate方法中也传入use_cache=False。

with torch.no_grad():  
    generation output = model -generate(  
        input ids=input ids,  
        generation config=generati on_config,  
        return dict in genera te=True,  
        output scores=True,  
        max new tokens=max new tokens,  
        output hidden states=True,  
        use cache=False  
    )

话题	回复	浏览量
MindSpore大模型在线推理速度慢及解决方案推理经验-Inference Experience	51	2025 年9 月 24 日
【昇思学习营第七期·昇腾开发板】20250803_学习打卡_3 活动打卡	42	2025 年8 月 5 日
LLama2_7b推理报错ValueError: For BatchMatMul, inputs shape cannot be broadcast on CPU/GPU.和解决推理经验-Inference Experience	31	2025 年8 月 18 日
MindSpore推理结果不稳定[ERROR] RUNTIME: aicpu kernel execute failed, fault kernel_name=GetNext.及解决推理经验-Inference Experience	19	2026 年4 月 18 日
MindSpore报错RuntimeError: Load op info form json config failed, version: Ascend310，及解决推理经验-Inference Experience	22	2025 年8 月 1 日

Mindspore在自回归推理时的精度对齐设置

1 系统环境

2 报错信息

2.1 问题描述

3 解决方案

相关话题