Mindspore在自回归推理时的精度对齐设置

1 系统环境

硬件环境(Ascend/GPU/CPU): Ascend/GPU/CPU
MindSpore版本: 2.1
执行模式(PyNative/ Graph): 不限

2 报错信息

2.1 问题描述

自回归推理时,不会使用KV cache,每次生成的token会拼接在当前输入的最后位置,作为下一次的输入。在mindspore中通过配置use_past设为False来开启自回归推理,在huggingface的transformer中模型配置文件config.json中配置项use_cache表示是否开启KV cache,但是直接设置use_cache为false并不会生效。

{  
    _name_or_path": "bigcode/starcoder"  
    "activation_function": "gelu"  
    "architectures": [  
    "GPTBigCodeForCausalLM'  
    "attention softma x in fp32": true,  
    "attn pdrop": 0.1,  
    "bos token id": 0,  
    "embd pdrop": 0.1,  
    "eos_token_id": 0,  
    "inference_runner": 0,  
    "initializer_range": 0.02,  
    "layer_nom_epsilon": 1e-05,  
    "max_batch_size": null,  
    "max_sequence_length": null,  
    "model_type": "gpt_bigcode",  
    "multi_query": true,  
    "n_embd": 6144,  
    "n_head": 48,  
    "n_inner": 24576,  
    "n_layer": 40,  
    "n_positions": 8192  
    "pad_key_length": true,  
    "pre_allocate_kv_cache": false,  
    "resid_pdrop": 0.1,  
    "scale_attention_so ftmax_in_fp32": false,  
    "scale_attn weights": true,  
    "summary_ activation": null,  
    "summary first  dropout": 0.1,  
    "summary_proj_to labels": true,  
    "summary_type": "cls index",  
    "summary_use_ proj": true,  
    "torch dtype": "float16",  
    "transfommers_version": "4.30.0.dev0",  
    "use cache": true,  
    vdLiudLe rumier_.Lnput": true,  
    "vocab size": 49153  
}

3 解决方案

需要在model.generate方法中也传入use_cache=False。

with torch.no_grad():  
    generation output = model -generate(  
        input ids=input ids,  
        generation config=generati on_config,  
        return dict in genera te=True,  
        output scores=True,  
        max new tokens=max new tokens,  
        output hidden states=True,  
        use cache=False  
    )