1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend/GPU/CPU
MindSpore版本: mindspore=1.5.0
执行模式(动态图/静态图): 不限
Python版本: Python=3.7.5
操作系统平台: Linux
2 报错信息
2.1 问题描述
在Ascend + Mindspore1.5上进行MindSpore数据格式转换脚本如下:
from mindspore.mindrecord import FileWriter
data_record_path = './datasets/convert_dataset_to_mindrecord/data_to_mindrecord/test.mindrecord'
writer = FileWriter(file_name=data_record_path, shard_num=4)
# 定义schema
data_schema = {"file_name": {"type": "string"}, "label": {"type": "int32"}, "data": {"type": "bytes"}}
writer.add_schema(data_schema, "test_schema")
# 数据准备
file_name = "./datasets/convert_dataset_to_mindrecord/images/transform.jpg"
with open(file_name, "rb") as f:
bytes_data = f.read()
data = [{"file_name": "transform.jpg", "label": 1, "data": bytes_data}]
indexes = ["file_name", "label"]
writer.add_index(indexes)
# 数据写入
writer.write_raw_data(data)
# 生成本地数据
writer.commit()
2.2 报错信息
MRMOpenError Traceback (most recent call last)
/tmp/ipykernel_108737/3747345416.py in <module>
17
18 #
数据写入
---> 19 writer.write_raw_data(data)
20
21 #
生成本地数据
/opt/nvme0n1/root/miniforge3/envs/mdp/lib/python3.7/site-packages/mindspore/mindrecord/filewriter.py in write_raw_data(self, raw_data, parallel_writer)
250 """
251 if not self._writer.is_open:
--> 252 self._writer.open(self._paths)
253 if not self._writer.get_shard_header():
254 self._writer.set_shard_header(self._header)
/opt/nvme0n1/root/miniforge3/envs/mdp/lib/python3.7/site-packages/mindspore/mindrecord/shardwriter.py in open(self, paths)
53 if ret != ms.MSRStatus.SUCCESS:
54 logger.error("Failed to open paths")
---> 55 raise MRMOpenError
56 self._is_open = True
57 return ret
MRMOpenError: [MRMOpenError]: MindRecord File could not open successfully.
3 根因分析
MindSpore1.6.0版本之前,MindSpore格式数据集生成时不支持覆盖写,当输出目录下存在同名文件时,异常信息不能准确反应错误信息,此时需要查看日志信息。如下所示,日志中第2行提示输出目录下已经存在同名mindrecord文件,需要提前删除。
[ERROR] ME(108737:281473404131632,MainProcess):2022-04-11-17:39:33.512.803 [mindspore/mindrecord/shardwriter.py:54] Failed to open paths
[ERROR] MD(108737,ffffa2445930,python3.7):2022-04-11-17:39:33.512.713 [mindspore/ccsrc/minddata/mindrecord/io/shard_writer.cc:92] OpenDataFiles] MindRecord file already existed, please delete file: /opt/nvme0n1/l00475263/workspace/datasets/convert_dataset_to_mindrecord/data_to_mindrecord/test.mindrecord0
[ERROR] MD(108737,ffffa2445930,python3.7):2022-04-11-17:39:33.512.752 [mindspore/ccsrc/minddata/mindrecord/io/shard_writer.cc:167] Open] Open data files failed.
MindSpore1.6.0之后版本,定义FileWriter对象时,可以加上overwrite=True来实现覆盖写。
4 解决方案
使用环境:
硬件环境(Ascend/GPU/CPU): CPU
MindSpore版本: mindspore=1.8.0
执行模式(动态图/静态图): 不限
Python版本: Python=3.7.5
操作系统平台: Linux
解决方法:
将原代码中的:
writer = FileWriter(file_name=data_record_path, shard_num=4)
修改为:
writer = FileWriter(file_name=data_record_path, shard_num=4, overwrite=True)
总体代码:
from mindspore.mindrecord import FileWriter
data_record_path = './datasets/convert_dataset_to_mindrecord/data_to_mindrecord/test.mindrecord'
writer = FileWriter(file_name=data_record_path, shard_num=4, overwrite=True)
# 定义schema
data_schema = {"file_name": {"type": "string"}, "label": {"type": "int32"}, "data": {"type": "bytes"}}
writer.add_schema(data_schema, "test_schema")
# 数据准备
file_name = "./datasets/convert_dataset_to_mindrecord/images/transform.jpg"
with open(file_name, "rb") as f:
bytes_data = f.read()
data = [{"file_name": "transform.jpg", "label": 1, "data": bytes_data}]
indexes = ["file_name", "label"]
writer.add_index(indexes)
# 数据写入
writer.write_raw_data(data)
# 生成本地数据
writer.commit()