MindSpore报错MRMOpenError: MindRecord File could not open successfully.

1 系统环境

硬件环境(Ascend/GPU/CPU): Ascend/GPU/CPU
MindSpore版本: mindspore=1.5.0
执行模式(动态图/静态图): 不限
Python版本: Python=3.7.5
操作系统平台: Linux

2 报错信息

2.1 问题描述

在Ascend + Mindspore1.5上进行MindSpore数据格式转换脚本如下:

from mindspore.mindrecord import FileWriter  
    
data_record_path = './datasets/convert_dataset_to_mindrecord/data_to_mindrecord/test.mindrecord'  
writer = FileWriter(file_name=data_record_path, shard_num=4)  
    
# 定义schema  
data_schema = {"file_name": {"type": "string"}, "label": {"type": "int32"}, "data": {"type": "bytes"}}  
writer.add_schema(data_schema, "test_schema")  
    
# 数据准备  
file_name = "./datasets/convert_dataset_to_mindrecord/images/transform.jpg"  
    
with open(file_name, "rb") as f:  
    bytes_data = f.read()  
    
data = [{"file_name": "transform.jpg", "label": 1, "data": bytes_data}]  
    
indexes = ["file_name", "label"]  
writer.add_index(indexes)  
    
# 数据写入  
writer.write_raw_data(data)  
    
# 生成本地数据  
writer.commit()

2.2 报错信息

MRMOpenError                              Traceback (most recent call last)  
/tmp/ipykernel_108737/3747345416.py in <module>  
     17   
     18 #   
数据写入  
---> 19 writer.write_raw_data(data)  
     20   
     21 #   
生成本地数据  
/opt/nvme0n1/root/miniforge3/envs/mdp/lib/python3.7/site-packages/mindspore/mindrecord/filewriter.py in write_raw_data(self, raw_data, parallel_writer)  
    250         """  
    251         if not self._writer.is_open:  
--> 252             self._writer.open(self._paths)  
    253         if not self._writer.get_shard_header():  
    254             self._writer.set_shard_header(self._header)  
/opt/nvme0n1/root/miniforge3/envs/mdp/lib/python3.7/site-packages/mindspore/mindrecord/shardwriter.py in open(self, paths)  
     53         if ret != ms.MSRStatus.SUCCESS:  
     54             logger.error("Failed to open paths")  
---> 55             raise MRMOpenError  
     56         self._is_open = True  
     57         return ret  
MRMOpenError: [MRMOpenError]: MindRecord File could not open successfully.

3 根因分析

MindSpore1.6.0版本之前,MindSpore格式数据集生成时不支持覆盖写,当输出目录下存在同名文件时,异常信息不能准确反应错误信息,此时需要查看日志信息。如下所示,日志中第2行提示输出目录下已经存在同名mindrecord文件,需要提前删除。

[ERROR] ME(108737:281473404131632,MainProcess):2022-04-11-17:39:33.512.803 [mindspore/mindrecord/shardwriter.py:54] Failed to open paths  
[ERROR] MD(108737,ffffa2445930,python3.7):2022-04-11-17:39:33.512.713 [mindspore/ccsrc/minddata/mindrecord/io/shard_writer.cc:92] OpenDataFiles] MindRecord file already existed, please delete file: /opt/nvme0n1/l00475263/workspace/datasets/convert_dataset_to_mindrecord/data_to_mindrecord/test.mindrecord0  
[ERROR] MD(108737,ffffa2445930,python3.7):2022-04-11-17:39:33.512.752 [mindspore/ccsrc/minddata/mindrecord/io/shard_writer.cc:167] Open] Open data files failed.

MindSpore1.6.0之后版本,定义FileWriter对象时,可以加上overwrite=True来实现覆盖写。

4 解决方案

使用环境:

硬件环境(Ascend/GPU/CPU): CPU
MindSpore版本: mindspore=1.8.0
执行模式(动态图/静态图): 不限
Python版本: Python=3.7.5
操作系统平台: Linux

解决方法:

将原代码中的:

writer = FileWriter(file_name=data_record_path, shard_num=4)

修改为:

writer = FileWriter(file_name=data_record_path, shard_num=4, overwrite=True)

总体代码:

from mindspore.mindrecord import FileWriter  
    
data_record_path = './datasets/convert_dataset_to_mindrecord/data_to_mindrecord/test.mindrecord'  
writer = FileWriter(file_name=data_record_path, shard_num=4, overwrite=True)  
    
# 定义schema  
data_schema = {"file_name": {"type": "string"}, "label": {"type": "int32"}, "data": {"type": "bytes"}}  
writer.add_schema(data_schema, "test_schema")  
    
# 数据准备  
file_name = "./datasets/convert_dataset_to_mindrecord/images/transform.jpg"  
    
with open(file_name, "rb") as f:  
    bytes_data = f.read()  
    
data = [{"file_name": "transform.jpg", "label": 1, "data": bytes_data}]  
    
indexes = ["file_name", "label"]  
writer.add_index(indexes)  
    
# 数据写入  
writer.write_raw_data(data)  
    
# 生成本地数据  
writer.commit()