mindformers进行Lora微调后的权重合并

1. mindformers当前Lora微调保存权重,lora权重未合并,不支持直接离线推理

model.layers.37.attention.wq.mindpet_delta_lora_a
model.layers.37.attention.wq.mindpet_delta_lora_b

权重中包含原始权重和a、b两个lora低秩权重,需要合并后才能进行离线推理。

2. LoRA低秩适配原理

根据lora实现原理,只需要将原本的权重加上sacle低秩的权重就可以得到模型最终的权重。

3. 脚本实现lora权重合并

def transpose(weight, fan_in_fan_out):
    return weight.T if fan_in_fan_out else weight

ms.transform_checkpoints(src_ckpt_dir, dst_ckpt_dir, prefix, src_ckpt_strategy, dst_ckpt_strategy)
print("......Over transform......")
print("......Start transform lora......")
#           lora超参lora_scaling 计算lora_r=lora_rank
lora_scaling = args.lora_alpha * args.lora_r
save_path = dst_ckpt_dir + "/rank_0/" + prefix + "0.ckpt"
# such as   model.layers.attention.wq.weight
#           model.layers.attention.wq.mindpet_delta_lora_a
#           model.layers.attention.wq.mindpet_delta_lora_b
param_dict = ms.load_checkpoint(save_path)
lora_keys = [k for k in param_dict if 'lora_a' in k]
non_lora_keys = [k for k in param_dict if not 'lora_' in k]
param_dict_lora = {}
for k in non_lora_keys:
    param_dict_lora[k] = param_dict[k].clone()
for k in lora_keys:
    print(f"merging {k}")
    original_key = k.replace('_lora_a', '').replace('mindpet_delta', 'weight')
    assert original_key in param_dict
    lora_a_key = k
    lora_b_key = k.replace('lora_a', 'lora_b')
    
    param_dict_lora[original_key] = Parameter(param_dict_lora[original_key].value() + (
            transpose(param_dict[lora_b_key].value() @ param_dict[lora_a_key].value(),
                        False) * lora_scaling), name=original_key
                                                )
print("......Start save transform lora......")
os.remove(save_path)
save_path = dst_ckpt_dir + "/rank_0/" + prefix + "_lora_merged.ckpt"
ms.save_checkpoint(param_dict_lora, save_path)