Qwen3单卡的yami文件配置

对pretrain_qwen3_32b_4k.yaml文件做了如下修改

(py39) [mindspore@cloud-9ba17f41-df67-4c06-8d88-680308453fb4-7649d677b6-9ps9p mindformers]$ diff configs/qwen3/pretrain_qwen3_32b_4k.yaml configs/qwen3/pretrain_qwen3_0_6b_4k.yaml

9,11c9,11
< use_parallel: True
< run_mode: ‘train’
< use_legacy: False

use_parallel: False
run_mode: ‘finetune’
use_legacy: True
20c20
< epochs: 2


epochs: 1
44c44
< - 8000 # Number of samples in the training set


  - 400  # Number of samples in the training set

61c61

< - “/path/to/wiki103-megatron_text_document”

    - "/home/mindspore/work/demo/megatron_data/wikitext-2-v1-qwen3_text_document_text_document"

64c64

< num_parallel_workers: 8

num_parallel_workers: 1
89,92c89,92
< model_parallel: 4 # Number of model parallel
< pipeline_stage: 4 # Number of pipeline parallel
< micro_batch_num: 4 # Pipeline parallel microbatch size
< vocab_emb_dp: True # Whether to split the vocabulary in the data parallel dimension


model_parallel: 1 # Nuimber of model parallel
pipeline_stage: 1 # Number of pipeline parallel
micro_batch_num: 1 # Pipeline parallel microbatch size
vocab_emb_dp: False # Whether to split the vocabulary in the data parallel dimension
102c102
< full_batch: False # Whether to load the full batch of data in parallel mode


full_batch: True # Whether to load the full batch of data in parallel mode
120c120
< parallel_optimizer_comm_recompute: True


parallel_optimizer_comm_recompute: False
125a126
type: “Qwen3ForCausalLM”
126a128
type: “Qwen3Config”
129,132c131,134
< hidden_size: 5120
< intermediate_size: 25600
< num_hidden_layers: 64
< num_attention_heads: 64


hidden_size: 1024
intermediate_size: 3072
num_hidden_layers: 28
num_attention_heads: 16

158c160

< offset: [-1, -1, 1, 1]

offset: 0

经过msrun --bind_core=True --worker_num=1 --local_worker_num=1 --master_port=7118 --log_dir=output/msrun_log --join=True --cluster_time_out=300 run_mindformer.py --config /home/mindspore/work/demo/mindformers/configs/qwen3/pretrain_qwen3_0_6b_4k.yaml拉起后报错

Traceback (most recent call last):
File “/home/mindspore/miniconda3/envs/py39/bin/msrun”, line 7, in
sys.exit(main())
File “/home/mindspore/miniconda3/envs/py39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py”, line 191, in main
run(args)
File “/home/mindspore/miniconda3/envs/py39/lib/python3.9/site-packages/mindspore/parallel/cluster/run.py”, line 185, in run
process_manager.run()
File “/home/mindspore/miniconda3/envs/py39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py”, line 268, in run
self.join_processes()
File “/home/mindspore/miniconda3/envs/py39/lib/python3.9/site-packages/mindspore/parallel/cluster/process_entity/_api.py”, line 387, in join_processes
raise RuntimeError("Distributed job exited with exception. Please check logs in "
RuntimeError: Distributed job exited with exception. Please check logs in directory: output/msrun_log.

日志内容为:

(py39) [mindspore@cloud-9ba17f41-df67-4c06-8d88-680308453fb4-7649d677b6-9ps9p mindformers]$ tail -f output/msrun_log/worker_0.log
network = build_model(config, default_args=default_args)
File “/home/mindspore/work/demo/mindformers/mindformers/models/build_model.py”, line 63, in build_model
model_config = build_model_config(config.model_config, default_args=default_args)
File “/home/mindspore/work/demo/mindformers/mindformers/models/build_config.py”, line 62, in build_model_config
return MindFormerRegister.get_instance_from_cfg(
File “/home/mindspore/work/demo/mindformers/mindformers/tools/register/register.py”, line 389, in get_instance_from_cfg
obj_cls = cls.get_cls(module_type, obj_type)
File “/home/mindspore/work/demo/mindformers/mindformers/tools/register/register.py”, line 287, in get_cls
raise ValueError(f"Can’t find class type {module_type} class name {class_name} in class registry "
ValueError: Can’t find class type config class name Qwen3Config in class registry when use_legacy=True

用户您好,欢迎使用MindSpore,已经收到您上述的问题,还请耐心等待下答复~

感谢

use_legacy不能随便修改,除非你确定架构是支持的.

use_legacy控制的是使用mindformers还是使用mindspore来进行模型的推理.具体设置对应哪个我忘了.

好的,我去试试,太感谢了。

用户您好,MindSpore支撑人已经分析并给出了问题的原因,由于较长时间未看到您采纳回答,这里版主将进行采纳回答的结帖操作,如果还其他疑问请发新帖子提问,谢谢支持~

此话题已在最后回复的 60 分钟后被自动关闭。不再允许新回复。