使用mindspore2.7.0的GeneratorDataset模块,总是报Segmentation fault(核心已转储)

我从torch中把DataLoader模块迁移过来,因此重写了Dataset(就是_len_, getitem),然后使用GeneratorDataset替换pytorch中的DataLoader:demo:
set_base_path = Path(r"D:\datasets\test\test_pcap\test_save")

train_sets = FlowFragmentDataset(set_base_path / “train”, category_map=category_map) test_sets = FlowFragmentDataset(set_base_path / “test”, category_map=category_map)

collate_fn = FlowFragmentCNNRESCollateFn(mode=“train”, max_packet_length=8, packet_num=72, max_seq_num=512)

train_batch = 32

test_batch = 32

train_loader = (GeneratorDataset(train_sets, column_names=[“flow”], shuffle=True, num_parallel_workers=4) .batch(train_batch, per_batch_map=collate_fn, output_columns=[“flow”, “seq”, “seq_attention_mask”, “labels”]))

test_loader = GeneratorDataset(test_sets, column_names=[“flow”], shuffle=True, num_parallel_workers=4).batch(test_batch, per_batch_map=collate_fn, output_columns=[“flow”, “seq”, “seq_attention_mask”, “labels”])

for it in train_loader:

pass

使用该测试代码测试数据加载模块是否存在问题,但是在运行过程中,就会出现标题的错误,调试信息如下:Fatal Python error: Segmentation fault

Thread 0x0000ffff9d922120 (most recent call first):

Thread 0x0000ffffab9a2120 (most recent call first):
File “/usr/local/lib/python3.11/site-packages/mindspore/common/tensor.py”, line 978 in asnumpy
File “/usr/local/lib/python3.11/site-packages/mindspore/common/tensor.py”, line 474 in array

Thread 0x0000ffffae1b2120 (most recent call first):
File “/usr/local/lib/python3.11/site-packages/mindspore/common/tensor.py”, line 978 in asnumpy
File “/usr/local/lib/python3.11/site-packages/mindspore/common/tensor.py”, line 474 in array

Thread 0x0000ffffb09c2120 (most recent call first):
File “/usr/local/lib/python3.11/site-packages/mindspore/common/tensor.py”, line 978 in asnumpy
File “/usr/local/lib/python3.11/site-packages/mindspore/common/tensor.py”, line 474 in array

Thread 0x0000ffffb31d2120 (most recent call first):
File “/usr/local/lib/python3.11/site-packages/mindspore/common/tensor.py”, line 978 in asnumpy
File “/usr/local/lib/python3.11/site-packages/mindspore/common/tensor.py”, line 474 in array

Thread 0x0000ffff03283120 (most recent call first):
File “/usr/local/lib/python3.11/threading.py”, line 327 in wait
File “/usr/local/lib/python3.11/multiprocessing/queues.py”, line 231 in _feed
File “/usr/local/lib/python3.11/threading.py”, line 982 in run
File “/usr/local/lib/python3.11/threading.py”, line 1045 in _bootstrap_inner
File “/usr/local/lib/python3.11/threading.py”, line 1002 in _bootstrap

Thread 0x0000ffffa1962120 (most recent call first):
File “/usr/local/lib/python3.11/threading.py”, line 327 in wait
File “/usr/local/lib/python3.11/multiprocessing/queues.py”, line 231 in _feed
File “/usr/local/lib/python3.11/threading.py”, line 982 in run
File “/usr/local/lib/python3.11/threading.py”, line 1045 in _bootstrap_inner
File “/usr/local/lib/python3.11/threading.py”, line 1002 in _bootstrap

Thread 0x0000ffffa4172120 (most recent call first):
File “/usr/local/lib/python3.11/threading.py”, line 327 in wait
File “/usr/local/lib/python3.11/multiprocessing/queues.py”, line 231 in _feed
File “/usr/local/lib/python3.11/threading.py”, line 982 in run
File “/usr/local/lib/python3.11/threading.py”, line 1045 in _bootstrap_inner
File “/usr/local/lib/python3.11/threading.py”, line 1002 in _bootstrap

Thread 0x0000ffffa6982120 (most recent call first):
File “/usr/local/lib/python3.11/threading.py”, line 327 in wait
File “/usr/local/lib/python3.11/multiprocessing/queues.py”, line 231 in _feed
File “/usr/local/lib/python3.11/threading.py”, line 982 in run
File “/usr/local/lib/python3.11/threading.py”, line 1045 in _bootstrap_inner
File “/usr/local/lib/python3.11/threading.py”, line 1002 in _bootstrap

Thread 0x0000ffffbbf62020 (most recent call first):
File “/usr/local/lib/python3.11/site-packages/mindspore/dataset/engine/iterators.py”, line 513 in _get_next
File “/usr/local/lib/python3.11/site-packages/mindspore/dataset/engine/iterators.py”, line 341 in serial_conversion_iteration
File “/usr/local/lib/python3.11/site-packages/mindspore/dataset/engine/iterators.py”, line 360 in next
File “/home/jy/programs/./test_error.py”, line 35 in

用户您好,欢迎使用MindSpore,已经收到您上述的问题,还请耐心等待下答复~

@minder 你好,看到了你是将 FlowFragmentDataset → GeneratorDataset → .batch 进行的数据处理,这些用法是正确的。但是我看不到 FlowFragmentDataset中 和 collate_fn中 的具体实现,所以不能直接告诉你哪行出错了。

下面是按照我的经验给出的建议,你可以基于此排查下:

  1. 因为当前mindspore框架的一些限制,不能在 dataset自定义部分(如:FlowFragmentDataset中、collate_fn中)使用 Tensor、nn、ops等计算算子操作,这种会导致莫名其妙的core dump。
  2. 你可以在 FlowFragmentDataset 、collate_fn 中添加一些print日志输出,有助于你定位出哪行代码导致的core dump。
  3. 如果定位到出错的代码行,可以将其替换成相应的 numpy 操作。

可以按以上的思路进一步确认下。或者,将你的工程代码打包上传上来,做成一个最小的可以运行的示例,这样能更清晰的告诉你哪里出错。

找到问题了,我采用了torch的DataLoader中的处理方案,出collate_fn的值为张量,我改成numpy之后恢复正常

1 Like

此话题已在最后回复的 60 分钟后被自动关闭。不再允许新回复。