Qwen3单卡的yami文件配置

Nyquispei · 2025 年10 月 13 日 18:36

对pretrain_qwen3_32b_4k.yaml文件做了如下修改

(py39) [mindspore@cloud-9ba17f41-df67-4c06-8d88-680308453fb4-7649d677b6-9ps9p mindformers]$ diff configs/qwen3/pretrain_qwen3_32b_4k.yaml configs/qwen3/pretrain_qwen3_0_6b_4k.yaml

9,11c9,11
< use_parallel: True
< run_mode: ‘train’
< use_legacy: False

use_parallel: False
run_mode: ‘finetune’
use_legacy: True
20c20
< epochs: 2

epochs: 1
44c44
< - 8000 # Number of samples in the training set

  - 400  # Number of samples in the training set

61c61

< - “/path/to/wiki103-megatron_text_document”

    - "/home/mindspore/work/demo/megatron_data/wikitext-2-v1-qwen3_text_document_text_document"

64c64

< num_parallel_workers: 8

num_parallel_workers: 1
89,92c89,92
< model_parallel: 4 # Number of model parallel
< pipeline_stage: 4 # Number of pipeline parallel
< micro_batch_num: 4 # Pipeline parallel microbatch size
< vocab_emb_dp: True # Whether to split the vocabulary in the data parallel dimension

model_parallel: 1 # Nuimber of model parallel
pipeline_stage: 1 # Number of pipeline parallel
micro_batch_num: 1 # Pipeline parallel microbatch size
vocab_emb_dp: False # Whether to split the vocabulary in the data parallel dimension
102c102
< full_batch: False # Whether to load the full batch of data in parallel mode

full_batch: True # Whether to load the full batch of data in parallel mode
120c120
< parallel_optimizer_comm_recompute: True

parallel_optimizer_comm_recompute: False
125a126
type: “Qwen3ForCausalLM”
126a128
type: “Qwen3Config”
129,132c131,134
< hidden_size: 5120
< intermediate_size: 25600
< num_hidden_layers: 64
< num_attention_heads: 64

hidden_size: 1024
intermediate_size: 3072
num_hidden_layers: 28
num_attention_heads: 16

158c160

< offset: [-1, -1, 1, 1]

offset: 0

经过msrun --bind_core=True --worker_num=1 --local_worker_num=1 --master_port=7118 --log_dir=output/msrun_log --join=True --cluster_time_out=300 run_mindformer.py --config /home/mindspore/work/demo/mindformers/configs/qwen3/pretrain_qwen3_0_6b_4k.yaml拉起后报错

Traceback (most recent call last):
File “/home/mindspore/miniconda3/envs/py39/bin/msrun”, line 7, in
sys.exit(main())
File “/home/mindspore/miniconda3/envs/py39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py”, line 191, in main
run(args)
File “/home/mindspore/miniconda3/envs/py39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py”, line 185, in run
process_manager.run()
File “/home/mindspore/miniconda3/envs/py39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py”, line 268, in run
self.join_processes()
File “/home/mindspore/miniconda3/envs/py39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py”, line 387, in join_processes
raise RuntimeError("Distributed job exited with exception. Please check logs in "
RuntimeError: Distributed job exited with exception. Please check logs in directory: output/msrun_log.

日志内容为：

(py39) [mindspore@cloud-9ba17f41-df67-4c06-8d88-680308453fb4-7649d677b6-9ps9p mindformers]$ tail -f output/msrun_log/worker_0.log
network = build_model(config, default_args=default_args)
File “/home/mindspore/work/demo/mindformers/mindformers/models/build_model.py”, line 63, in build_model
model_config = build_model_config(config.model_config, default_args=default_args)
File “/home/mindspore/work/demo/mindformers/mindformers/models/build_config.py”, line 62, in build_model_config
return MindFormerRegister.get_instance_from_cfg(
File “/home/mindspore/work/demo/mindformers/mindformers/tools/register/register.py”, line 389, in get_instance_from_cfg
obj_cls = cls.get_cls(module_type, obj_type)
File “/home/mindspore/work/demo/mindformers/mindformers/tools/register/register.py”, line 287, in get_cls
raise ValueError(f"Can’t find class type {module_type} class name {class_name} in class registry "
ValueError: Can’t find class type config class name Qwen3Config in class registry when use_legacy=True

chengxiaoli · 2025 年10 月 14 日 02:13

用户您好，欢迎使用MindSpore，已经收到您上述的问题，还请耐心等待下答复~

Nyquispei · 2025 年10 月 14 日 06:36

感谢

longvoyage · 2025 年10 月 14 日 07:52

use_legacy不能随便修改,除非你确定架构是支持的.

use_legacy控制的是使用mindformers还是使用mindspore来进行模型的推理.具体设置对应哪个我忘了.

Nyquispei · 2025 年10 月 14 日 07:55

好的，我去试试，太感谢了。

chengxiaoli · 2025 年10 月 16 日 02:14

用户您好，MindSpore支撑人已经分析并给出了问题的原因，由于较长时间未看到您采纳回答，这里版主将进行采纳回答的结帖操作，如果还其他疑问请发新帖子提问，谢谢支持~

system · 2025 年10 月 16 日 03:14

此话题已在最后回复的 60 分钟后被自动关闭。不再允许新回复。

话题		回复	浏览量
MindSpore+MindFormer-r.1.2.0微调qwen1.5 报错分布式并行-Distributed Parallelsim	1	38	2025 年8 月 3 日
mindformers进行Lora微调时候程序崩溃并提示Bus error 问题求助 Help 昇腾开发板	6	46	2026 年1 月 29 日
mindformers推理qwen2.5-72b报显存不足及解决推理经验-Inference Experience	0	34	2025 年9 月 1 日
模型推理报错问题求助 Help 模型 , 调试 , 推理	5	79	2025 年7 月 19 日
MindSpore大模型并行需要在对应的yaml里面做哪些配置分布式并行-Distributed Parallelsim	0	17	2025 年10 月 1 日

Qwen3单卡的yami文件配置

(py39) [mindspore@cloud-9ba17f41-df67-4c06-8d88-680308453fb4-7649d677b6-9ps9p mindformers]$ diff configs/qwen3/pretrain_qwen3_32b_4k.yaml configs/qwen3/pretrain_qwen3_0_6b_4k.yaml

61c61

64c64

158c160

相关话题