【飞桨开发者说】侯继旭,海南师范大学本三自动化专业在读,人工智能开发爱好者,曾获2019中国高校计算机大赛-人工智能创意赛海南省一等奖、2019年度海南省高等学校科学研究“人工智能”优秀成果奖
项目涉及的全部资料也都打包放在百度网盘(PaddleDetection、Paddle Lite Demo、Paddle Lite、opt),可下载到本地体验。
链接:https://pan.baidu.com/s/1IKT-ByVN9BaVxfqQC1VaMw 提取码:mdd1
本项目用的数据集格式是VOC格式,标注工具为labelimg,图像数据是手动拍摄获取。
整理成VOC格式的数据集:
创建三个文件夹:Annotations、ImageSets、JPEGImages
将标注生成的XML文件存入Annotations,图片存入JPEGImages,训练集、测试集、验证集的划分情况存入ImageSets。
在ImageSets下创建一个Main文件夹,并且在Mian文件夹下建立labellist.txt,里面存入标注的标签。
此labellist.txt文件复制一份与Annotations、ImageSets、JPEGImages同级位置放置。
其内容如下:
运行该代码将会生成trainval.txt、train.txt、val.txt、test.txt,将我们标注的600张图像按照训练集、验证集、测试集的形式做一个划分。
import os import random trainval_percent = 0.95 #训练集验证集总占比 train_percent = 0.9 #训练集在trainval_percent里的train占比 xmlfilepath = 'F:/Cola/Annotations' txtsavepath = 'F:/Cola/ImageSets/Main' total_xml = os.listdir(xmlfilepath) num=len(total_xml) list=range(num) tv=int(num*trainval_percent) tr=int(tv*train_percent) trainval= random.sample(list,tv) train=random.sample(trainval,tr) ftrainval = open('F:/Cola/ImageSets/Main/trainval.txt', 'w') ftest = open('F:/Cola/ImageSets/Main/test.txt', 'w') ftrain = open('F:/Cola/ImageSets/Main/train.txt', 'w') fval = open('F:/Cola/ImageSets/Main/val.txt', 'w') for i in list: name=total_xml[i][:-4]+'\n' if i in trainval: ftrainval.write(name) if i in train: ftrain.write(name) else: fval.write(name) else: ftest.write(name) ftrainval.close() ftrain.close() fval.close() ftest .close()
以下代码可根据在Main文件夹中划分好的数据集进行位置索引,生成含有图像及对应的XML文件的地址信息的文件。
import os import re import random devkit_dir = './' output_dir = './' def get_dir(devkit_dir, type): return os.path.join(devkit_dir, type) def walk_dir(devkit_dir): filelist_dir = get_dir(devkit_dir, 'ImageSets/Main') annotation_dir = get_dir(devkit_dir, 'Annotations') img_dir = get_dir(devkit_dir, 'JPEGImages') trainval_list = [] train_list = [] val_list = [] test_list = [] added = set() for _, _, files in os.walk(filelist_dir): for fname in files: print(fname) img_ann_list = [] if re.match('trainval.txt', fname): img_ann_list = trainval_list elif re.match('train.txt', fname): img_ann_list = train_list elif re.match('val.txt', fname): img_ann_list = val_list elif re.match('test.txt', fname): img_ann_list = test_list else: continue fpath = os.path.join(filelist_dir, fname) for line in open(fpath): name_prefix = line.strip().split()[0] print(name_prefix) added.add(name_prefix) #ann_path = os.path.join(annotation_dir, name_prefix + '.xml') ann_path = annotation_dir + '/' + name_prefix + '.xml' print(ann_path) #img_path = os.path.join(img_dir, name_prefix + '.jpg') img_path = img_dir + '/' + name_prefix + '.jpg' assert os.path.isfile(ann_path), 'file %s not found.' % ann_path assert os.path.isfile(img_path), 'file %s not found.' % img_path img_ann_list.append((img_path, ann_path)) print(img_ann_list) return trainval_list, train_list, val_list, test_list def prepare_filelist(devkit_dir, output_dir): trainval_list = [] train_list = [] val_list = [] test_list = [] trainval, train, val, test = walk_dir(devkit_dir) trainval_list.extend(trainval) train_list.extend(train) val_list.extend(val) test_list.extend(test) #print(trainval) with open(os.path.join(output_dir, 'trainval.txt'), 'w') as ftrainval: for item in trainval_list: ftrainval.write(item[0] + ' ' + item[1] + '\n') with open(os.path.join(output_dir, 'train.txt'), 'w') as ftrain: for item in train_list: ftrain.write(item[0] + ' ' + item[1] + '\n') with open(os.path.join(output_dir, 'val.txt'), 'w') as fval: for item in val_list: fval.write(item[0] + ' ' + item[1] + '\n')