场景四:完整ckpt 权重转换为分布式权重
建议还是用safetensors格式转换
方法一:通过ckpt_to_safetensors + transform_checkpoint
建议使用safetensors进行权重转换
import mindspore as ms
ms.transform_checkpoints (src_checkpoints_dir, dst_checkpoints_dir, ckpt_prefix, src_strategy_file=None, dst_strategy_file=None, process_num=1, output_format='ckpt')
print("Transform ckpt DONE", flush=True)
- src_checkpoints_dir:权重A的路径,分布式权重指定到rank的上一层路径,完整权重指定到文件名
- dst_checkpoints_dir:生成权重B的路径
- ckpt_prefix:生成权重B文件名的前缀
- src_strategy_file:权重A合并后的策略文件,对于完整权重指定None
- dst_strategy_file:权重B合并后的策略文件,对于完整权重指定None
- process_num:线程格数,根据自己机器cpu、内存使用情况绝对
- output_format:“ckpt” 或 “safetensors”
样例:
- 合并目标策略文件
- ckpt转safetensors
import mindspore as ms
ms.ckpt_to_safetensors(
file_path="/xxx/ckpt_convert/weight_o/net.ckpt",
save_path="/xxx/ckpt_convert/weight_o/ckpt_to_sf_all/",
processes_num=8)
输入:
输出:
3. 调用transform_checkpoints进行safetensors转分布式ckpt
import mindspore as ms
ms.transform_checkpoints ("/xxx/ckpt_convert/weight_o/ckpt_to_sf_all/rank_0/net.safetensors", "/xxx/ckpt_convert/weight_o/distributed_ckpt", "dst_ckpt_prefix", src_strategy_file=None, dst_strategy_file="/xxx/ckpt_convert/142merge.ckpt", process_num=4, output_format='ckpt')
print("Transform ckpt DONE", flush=True)
方法二:通过 ckpt_to_safetensors + load_distributed_checkpoint
import mindspore as ms
ms.load_distributed_checkpoint(
network=None, #可以不是None,直接传入Net
predict_strategy="权重B的策略文件(合并后)", #完整权重指定None
format='safetensors', #默认是ckpt,取值ckpt或safetensors
output_format='ckpt', #默认是'safetensors',取值ckpt或safetensors,2.4.0之后可以使用
unified_safetensors_dir="离线合并后的目录(上一步产生的)",
dst_safetensors_dir="生成的权重B目录" )
样例:
- 合并目标策略文件
- ckpt转safetensors
import mindspore as ms
ms.ckpt_to_safetensors(
file_path="/xxx/ckpt_convert/weight_o/net.ckpt",
save_path="/xxx/ckpt_convert/weight_o/ckpt_to_sf_all/",
processes_num=8)
3. 调用load_distributed_checkpoint进行safetensors转分布式ckpt
import mindspore as ms
ms.load_distributed_checkpoint(
network=None,
checkpoint_filenames=None,
predict_strategy="/xxx/ckpt_convert/142merge.ckpt",
format='safetensors',
unified_safetensors_dir="/xxx/ckpt_convert/weight_o/ckpt_to_sf_all/rank_0/net.safetensors",
dst_safetensors_dir="/xxx/ckpt_convert/weight_o/142distributed_ckpt_1" )
方法三:transform_checkpoint—不推荐
transform_checkpoints会对输入格式自动识别
import mindspore as ms
ms.transform_checkpoints ("/xxx/ckpt_convert/weight_o/ckpt_to_sf_all/rank_0/net.ckpt", "/xxx/ckpt_convert/weight_o/distributed_ckpt", "dst_ckpt_prefix", src_strategy_file=None, dst_strategy_file="/xxx/ckpt_convert/142merge.ckpt", process_num=4, output_format='ckpt')
print("Transform ckpt DONE", flush=True)
场景五:分布式权重转换为完整ckpt 权重
建议还是用safetensors进行格式转换
方法一:通过ckpt_to_safetensors +transform_checkpoint
import mindspore as ms
ms.transform_checkpoints (src_checkpoints_dir, dst_checkpoints_dir, ckpt_prefix, src_strategy_file=None, dst_strategy_file=None, process_num=1, output_format='ckpt')
print("Transform ckpt DONE", flush=True)
- src_checkpoints_dir:权重A的路径,分布式权重指定到rank的上一层路径,完整权重指定到文件名
- dst_checkpoints_dir:生成权重B的路径
- ckpt_prefix:生成权重B文件名的前缀
- src_strategy_file:权重A合并后的策略文件,对于完整权重指定None
- dst_strategy_file:权重B合并后的策略文件,对于完整权重指定None
- process_num:线程格数,根据自己机器cpu、内存使用情况决定
- output_format:“ckpt” 或 “safetensors”
样例:
import mindspore as ms
ms.transform_checkpoints ("/xxx/ckpt_convert/weight_o/ distributed_ckpt ", "/xxx/ckpt_convert/weight_o/all_ckpt", "dst_ckpt_prefix", src_strategy_file="src_strategy", dst_strategy_file=None, process_num=4, output_format='ckpt')
print("Transform ckpt DONE", flush=True)
方法二:通过ckpt_to_safetensors+ unified_safetensors+ load_distributed_checkpoint

ckpt_to_safetensors+ unified_safetensors—得到分片的sf,可以转分片的ckpt,然后手动合并










