乐观的小笼包 · 【GUI】【easyX】实现C语言课程设计可 ...· 1 月前 · |
飘逸的香蕉 · 国宝画重点丨沉浸式一览千年壁画风采_新闻频道 ...· 3 月前 · |
不敢表白的企鹅 · 佛教:当你“业障”深重时,会有这4大征兆,一 ...· 1 年前 · |
有情有义的菠萝 · 影评丨《月光光心慌慌:终结》:违背观众期待! ...· 1 年前 · |
玩篮球的圣诞树 · 特斯拉发布电动皮卡车:28 万体验未来 | 爱范儿· 1 年前 · |
魁梧的钥匙扣 · 亿万婚宠-第29话 你给我等着· 1 年前 · |
慷慨的圣诞树
1 年前 |
微软联合Facebook等在2017年搞了个深度学习以及机器学习模型的格式标准–ONNX,旨在将所有模型格式统一为一致,更方便地实现模型部署。现在大多数的深度学习框架都支持ONNX模型转出并提供相应的导出接口。
ONNXRuntime(Open Neural Network Exchange)是微软推出的一款针对ONNX模型格式的推理框架,用户可以非常便利的用其运行一个onnx模型。ONNXRuntime支持多种运行后端包括CPU,GPU,TensorRT,DML等。可以说ONNXRuntime是对ONNX模型最原生的支持,只要掌握模型导出的相应操作,便能对将不同框架的模型进行部署,提高开发效率。
利用onnx和onnxruntime实现 pytorch 深度框架使用C++推理进行服务器部署,模型推理的性能是比python快很多的。
利用C++ ONNXruntime部署自己的模型,这里用 Keras 搭建好的一个网络模型来举例,转换为onnx的文件,在C++上进行部署,另外可以利用tensorRT加速。
Github地址: https://github.com/zouyuelin/SLAM_Learning_notes/tree/main/PoseEstimation
GitHub下载地址:
Releases · microsoft/onnxruntime · GitHub
下载的onnxruntime是直接编译好的库文件,直接放在自定义的文件夹中即可。在CMakeLists.txt中引入onnxruntime的头文件、库文件即可。
# 引入头文件
include_directories(......../onnxruntime/include)
# 引入库文件
link_directories(......../onnxruntime/lib)
首先,利用pytorch自带的
torch.onnx
模块导出
.onnx
模型文件,具体查看该部分
pytorch官方文档
,主要流程如下:
import torch
checkpoint = torch.load(model_path)
model = ModelNet(params)
model.load_state_dict(checkpoint['model'])
model.eval()
input_x_1 = torch.randn(10,20)
input_x_2 = torch.randn(1,20,5)
output, mask = model(input_x_1, input_x_2)
torch.onnx.export(model,
(input_x_1, input_x_2),
'model.onnx',
input_names = ['input','input_mask'],
output_names = ['output','output_mask'],
opset_version=11,
verbose = True,
dynamic_axes={'input':{1,'seqlen'}, 'input_mask':{1:'seqlen',2:'time'},'output_mask':{0:'time'}})
torch.onnx.export参数在文档里面都有,opset_version对应的版本很重要,dynamic_axes是对输入和输出对应维度可以进行动态设置,不设置的话输入和输出的Tensor 的 shape是不能改变的,如果输入固定就不需要加。
导出的模型可否顺利使用可以先使用python进行检测
import onnxruntime as ort
import numpy as np
ort_session = ort.InferenceSession('model.onnx')
outputs = ort_session.run(None,{'input':np.random.randn(10,20),'input_mask':np.random.randn(1,20,5)})
# 由于设置了dynamic_axes,支持对应维度的变化
outputs = ort_session.run(None,{'input':np.random.randn(10,5),'input_mask':np.random.randn(1,26,2)})
# outputs 为 包含'output'和'output_mask'的list
import onnx
model = onnx.load('model.onnx')
onnx.checker.check_model(model)
如果没有异常代表导出的模型没有问题,目前torch.onnx.export只能对部分支持的Tensor操作进行识别,详情参考Supported operators,对于包括transformer等基本的模型都是没有问题的,如果出现ATen等问题,你就需要对模型不支持的Tensor操作进行改进,以免影响C++对该模型的使用。
2、Tensorflow Keras导出.onnx模型
搭建网络模型训练:
tensorflow keras 搭建相机位姿估计网络–例
网络的输入输出为:
网络的输入: [image_ref , image_cur]
网络的输出: [tx , ty , tz , roll , pitch , yaw]
训练的模型位置:kerasTempModel\
,一定要用model.save()
的方式,不能用model.save_model()
在onnxruntime调用需要onnx模型,这里需要将keras的模型转换为onnx模型;
安装转换的工具:
pip install tf2onnx
安装完后运行:
python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
tip:这里设置 opset 版本为14 的优化效率目前亲测是最好的,推理速度比版本 11 、12更快。
运行完以后在终端最后会告诉你网络模型的输入和输出:
2022-01-21 15:48:00,766 - INFO -
2022-01-21 15:48:00,766 - INFO - Successfully converted TensorFlow model kerasTempModel to ONNX
2022-01-21 15:48:00,766 - INFO - Model inputs: ['input1', 'input2']
2022-01-21 15:48:00,766 - INFO - Model outputs: ['Output']
2022-01-21 15:48:00,766 - INFO - ONNX model is saved at model.onnx
模型有两个输入,输入节点名分别为['input1', 'input2']
,输出节点名为['Output']
。
当然也可以不用具体知道节点名,在onnxruntime中可以通过打印来查看模型的输入输出节点名。
2.1、数据集的处理
数据集的格式:
# image_ref image_cur tx ty tz roll(x) pitch(y) yaw(z)
0 images/0.jpg images/1.jpg 0.000999509 -0.00102794 0.00987293 0.00473228 -0.0160252 -0.0222079
1 images/1.jpg images/2.jpg -0.00544488 -0.00282174 0.00828871 -0.00271557 -0.00770117 -0.0195182
2 images/2.jpg images/3.jpg -0.0074375 -0.00368121 0.0114751 -0.00721246 -0.0103843 -0.0171883
3 images/3.jpg images/4.jpg -0.00238111 -0.00371362 0.0120466 -0.0081171 -0.0149111 -0.0198595
4 images/4.jpg images/5.jpg 0.000965841 -0.00520437 0.0135452 -0.0141721 -0.0126401 -0.0182697
5 images/5.jpg images/6.jpg -0.00295753 -0.00340146 0.0144557 -0.013633 -0.00463747 -0.0143332
通过load_image映射数据集到 TF tensor
class datasets:
def __init__(self, datasetsPath:str):
self.dataPath = datasetsPath
self.dim = 512
self.epochs = 40
self.batch_size = 8
self.train_percent = 0.92
self.learning_rate = 2e-4
self.model_path = 'kerasTempModel/'
self.posetxt = os.path.join(self.dataPath,'pose.txt')
self.GetTheImagesAndPose()
self.buildTrainData()
def GetTheImagesAndPose(self):
self.poselist = []
with open(self.posetxt,'r') as f:
for line in f.readlines():
line = line.strip()
line = line.split(' ')
line.remove(line[0])
self.poselist.append(line)
# im1 im2 tx,ty,tz,roll,pitch,yaw
#打乱数据集
length = np.shape(self.poselist)[0]
train_num =int(length * self.train_percent)
test_num = length - train_num
randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
self.train_pose_list = randomPoses[0:train_num,:]
self.test_pose_list = randomPoses[train_num:length+1,:]
print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")
def load_image(self,index:tf.Tensor):
img_ref= img_ref = tf.io.read_file(index[0])
img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
img_ref = tf.image.resize(img_ref,(self.dim,self.dim))/255.0
#img = tf.reshape(img,[self.dim,self.dim,3])
img_ref = tf.cast(img_ref,tf.float32)
img_cur = img_cur = tf.io.read_file(index[1])
img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
img_cur = tf.image.resize(img_cur,(self.dim,self.dim))/255.0
#img = tf.reshape(img,[self.dim,self.dim,3])
img_cur = tf.cast(img_cur,tf.float32)
pose = tf.strings.to_number(index[2:8],tf.float32)
return (img_ref,img_cur),(pose)
def buildTrainData(self):
for example:\\
>>> poses = dataset.y_train.take(20)\\
>>> imgs = dataset.x1_train.take(40)\\
>>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
>>> imgs = dataset.x2_train.take(40)\\
>>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
>>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
.map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
.shuffle(500)\
.repeat(10)\
.batch(self.batch_size) \
.prefetch(tf.data.experimental.AUTOTUNE)#.cache()
self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
.map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
.shuffle(500)\
.repeat(10)\
.batch(self.batch_size) \
.prefetch(tf.data.experimental.AUTOTUNE)
2.2、网络模型的搭建
简单的模型
def model(dim):
First = K.layers.Input(shape=(dim,dim,3),name="input1")
Second = K.layers.Input(shape=(dim,dim,3),name="input2")
x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
x1 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x1)
x1 = K.layers.BatchNormalization()(x1)
x1 = K.layers.ReLU()(x1)
x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
x1 = K.layers.BatchNormalization()(x1)
x1 = K.layers.ReLU()(x1)
x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x1)
x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
x2 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x2)
x2 = K.layers.BatchNormalization()(x2)
x2 = K.layers.ReLU()(x2)
x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
x2 = K.layers.BatchNormalization()(x2)
x2 = K.layers.ReLU()(x2)
x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x2)
x = K.layers.concatenate([x1,x2])
x = K.layers.Conv2D(256,kernel_size=(3,3), strides=1,padding='same',
activation='relu')(x)
x = K.layers.BatchNormalization()(x)
x = K.layers.ReLU()(x)
x = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x)
x = K.layers.Conv2D(128,kernel_size=(3,3), strides=1,padding='same',
activation='relu')(x)
x = K.layers.BatchNormalization()(x)
x = K.layers.ReLU()(x)
x = K.layers.Flatten()(x)
x = K.layers.Dense(1024)(x)
x = K.layers.Dense(6,name='Output')(x)
poseModel = K.Model([First,Second],x)
return poseModel
#损失函数
def loss_fn(y_true,y_pre):
loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
return loss_value
# 学习率下降回调函数
class learningDecay(K.callbacks.Callback):
def __init__(self,schedule=None,alpha=1,verbose=0):
super().__init__()
self.schedule = schedule
self.verbose = verbose
self.alpha = alpha
def on_epoch_begin(self, epoch, logs=None):
lr = float(K.backend.get_value(self.model.optimizer.lr))
if self.schedule != None:
lr = self.schedule(epoch,lr)
else:
if epoch != 0:
lr = lr*self.alpha
K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr))
if self.verbose > 0:
print(f"Current learning rate is {lr}")
# 学习率计划
def scheduler(epoch, lr):
if epoch < 10:
return lr
else:
return lr * tf.math.exp(-0.1)
resnet34为主骨
#-------resnet 34-------------
def conv_block(inputs,
neuron_num,
kernel_size,
use_bias,
padding= 'same',
strides= (1, 1),
with_conv_short_cut = False):
conv1 = K.layers.Conv2D(
neuron_num,
kernel_size = kernel_size,
activation= 'relu',
strides= strides,
use_bias= use_bias,
padding= padding
)(inputs)
conv1 = K.layers.BatchNormalization(axis = 1)(conv1)
conv2 = K.layers.Conv2D(
neuron_num,
kernel_size= kernel_size,
activation= 'relu',
use_bias= use_bias,
padding= padding)(conv1)
conv2 = K.layers.BatchNormalization(axis = 1)(conv2)
if with_conv_short_cut:
inputs = K.layers.Conv2D(
neuron_num,
kernel_size= kernel_size,
strides= strides,
use_bias= use_bias,
padding= padding
)(inputs)
return K.layers.add([inputs, conv2])
else:
return K.layers.add([inputs, conv2])
def ResNet34(inputs,namescope = ""):
x = K.layers.ZeroPadding2D((3, 3))(inputs)
# Define the converlutional block 1
x = K.layers.Conv2D(64, kernel_size= (7, 7), strides= (2, 2), padding= 'valid')(x)
x = K.layers.BatchNormalization(axis= 1)(x)
x = K.layers.MaxPooling2D(pool_size= (3, 3), strides= (2, 2), padding= 'same')(x)
# Define the converlutional block 2
x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
# Define the converlutional block 3
x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
# Define the converlutional block 4
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
# Define the converltional block 5
x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
x = K.layers.AveragePooling2D(pool_size=(7, 7))(x)
return x
def model(dim_w,dim_h):
First = K.layers.Input(shape=(dim_w,dim_h,3),name="input1")
Second = K.layers.Input(shape=(dim_w,dim_h,3),name="input2")
# x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
x1 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(First)
x1 = K.layers.BatchNormalization()(x1)
x1 = K.layers.LeakyReLU()(x1)
# x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
# x1 = K.layers.BatchNormalization()(x1)
# x1 = K.layers.ReLU()(x1)
x1 = ResNet34(x1,"x1")
# x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
x2 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(Second)
x2 = K.layers.BatchNormalization()(x2)
x2 = K.layers.LeakyReLU()(x2)
# x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
# x2 = K.layers.BatchNormalization()(x2)
# x2 = K.layers.ReLU()(x2)
x2 = ResNet34(x2,"x2")
x = K.layers.concatenate([x1,x2])
x = K.layers.Flatten()(x)
x = K.layers.Dense(6,name='Output')(x)
poseModel = K.Model([First,Second],x)
return poseModel
def loss_fn(y_true,y_pre):
loss_value_translation = K.backend.square(y_true[-1,0:3]-y_pre[-1,0:3])
loss_value_rotation = 1/5.7*K.backend.square(y_true[-1,3:6]-y_pre[-1,3:6])
loss_value = K.backend.mean(loss_value_translation + loss_value_rotation)
# loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
# tf.print(y_pre)
return loss_value
2.3、模型的训练
build()
函数用来编译模型,建立回调函数;
train_fit()
函数是利用keras 的 fit函数来训练;
train_gradient()
函数是利用 apply_gradients
函数训练,可以实时对loss和梯度进行监控,灵活性强;
save_model()
函数是保存模型的函数,可以有多种保存的方式,利用model.save()
可以保存为h5文件和tf格式的模型。
class Posenet:
def __init__(self,dataset:datasets):
self.dataset = dataset
self.build()
def build(self):
self.poseModel = model(self.dataset.dim)
self.poseModel.summary()
self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
self.callbacks = [decayCallbackScheduler]
print("************************loading the model weights***********************************")
self.poseModel.load_weights("model.h5")
except:
def train_fit(self):
self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
self.poseModel.fit(self.dataset.traindata,
validation_data=self.dataset.testdata,
epochs=self.dataset.epochs,
callbacks=[self.decayCallback],
verbose=1)
def train_gradient(self):
for step in range(self.dataset.epochs):
loss = 0
val_loss = 0
lr = float(self.optm.lr)
tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
for (x1,x2),y in self.dataset.traindata:
with tf.GradientTape() as tape:
prediction = self.poseModel([x1,x2])
# y = tf.cast(y,dtype=prediction.dtype)
loss = loss_fn(y,prediction)
gradients = tape.gradient(loss,self.poseModel.trainable_variables)
self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))
for (x1,x2),y in self.dataset.testdata:
prediction = self.poseModel([x1,x2])
val_loss = loss_fn(y,prediction)
tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss),lr,val_loss))
K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))
def save_model(self):
利用 save 函数来保存,可以保存为h5文件,也可以保存为文件夹的形式,推荐保存第二种,再使用tf2onnx转onnx
>>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
self.poseModel.save("model.h5")
self.poseModel.save(self.dataset.model_path)
# self.poseModel.save_weights("model.h5") #只保存权重,没有保存结构
# tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了
if __name__ == "__main__":
dataset = datasets("images")
posenet = Posenet(dataset)
posenet.train_fit()
# posenet.train_gradient() #利用 apply_gradient的方式训练
posenet.save_model()
2.4、加载保存的模型
保存的模型文件夹为: kerasTempModel\
model = K.models.load_model(dataset.model_path,compile=False)
测试模型:
output = model([img_ref,img_cur])
2.5、完整代码
import argparse
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
import cv2 as cv
import os
import time
import random
from tensorflow.keras import optimizers
from tensorflow.keras import callbacks
class datasets:
def __init__(self, datasetsPath:str):
self.dataPath = datasetsPath
self.dim = 512
self.epochs = 40
self.batch_size = 8
self.train_percent = 0.92
self.learning_rate = 2e-4
self.model_path = 'kerasTempModel/'
self.posetxt = os.path.join(self.dataPath,'pose.txt')
self.GetTheImagesAndPose()
self.buildTrainData()
def GetTheImagesAndPose(self):
self.poselist = []
with open(self.posetxt,'r') as f:
for line in f.readlines():
line = line.strip()
line = line.split(' ')
line.remove(line[0])
self.poselist.append(line)
# im1 im2 tx,ty,tz,roll,pitch,yaw
#打乱数据集
length = np.shape(self.poselist)[0]
train_num =int(length * self.train_percent)
test_num = length - train_num
randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
self.train_pose_list = randomPoses[0:train_num,:]
self.test_pose_list = randomPoses[train_num:length+1,:]
print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")
def load_image(self,index:tf.Tensor):
img_ref= img_ref = tf.io.read_file(index[0])
img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
img_ref = tf.image.resize(img_ref,(self.dim,self.dim))/255.0
#img = tf.reshape(img,[self.dim,self.dim,3])
img_ref = tf.cast(img_ref,tf.float32)
img_cur = img_cur = tf.io.read_file(index[1])
img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
img_cur = tf.image.resize(img_cur,(self.dim,self.dim))/255.0
#img = tf.reshape(img,[self.dim,self.dim,3])
img_cur = tf.cast(img_cur,tf.float32)
pose = tf.strings.to_number(index[2:8],tf.float32)
return (img_ref,img_cur),(pose)
def buildTrainData(self):
for example:\\
>>> poses = dataset.y_train.take(20)\\
>>> imgs = dataset.x1_train.take(40)\\
>>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
>>> imgs = dataset.x2_train.take(40)\\
>>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
>>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
.map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
.shuffle(500)\
.repeat(10)\
.batch(self.batch_size) \
.prefetch(tf.data.experimental.AUTOTUNE)#.cache()
self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
.map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
.shuffle(500)\
.repeat(10)\
.batch(self.batch_size) \
.prefetch(tf.data.experimental.AUTOTUNE)
def model(dim):
First = K.layers.Input(shape=(dim,dim,3),name="input1")
Second = K.layers.Input(shape=(dim,dim,3),name="input2")
x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
x1 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x1)
x1 = K.layers.BatchNormalization()(x1)
x1 = K.layers.ReLU()(x1)
x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
x1 = K.layers.BatchNormalization()(x1)
x1 = K.layers.ReLU()(x1)
x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x1)
x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
x2 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x2)
x2 = K.layers.BatchNormalization()(x2)
x2 = K.layers.ReLU()(x2)
x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
x2 = K.layers.BatchNormalization()(x2)
x2 = K.layers.ReLU()(x2)
x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x2)
x = K.layers.concatenate([x1,x2])
x = K.layers.Conv2D(256,kernel_size=(3,3), strides=1,padding='same',
activation='relu')(x)
x = K.layers.BatchNormalization()(x)
x = K.layers.ReLU()(x)
x = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x)
x = K.layers.Conv2D(128,kernel_size=(3,3), strides=1,padding='same',
activation='relu')(x)
x = K.layers.BatchNormalization()(x)
x = K.layers.ReLU()(x)
x = K.layers.Flatten()(x)
x = K.layers.Dense(1024)(x)
x = K.layers.Dense(6,name='Output')(x)
poseModel = K.Model([First,Second],x)
return poseModel
def loss_fn(y_true,y_pre):
loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
return loss_value
class learningDecay(K.callbacks.Callback):
def __init__(self,schedule=None,alpha=1,verbose=0):
super().__init__()
self.schedule = schedule
self.verbose = verbose
self.alpha = alpha
def on_epoch_begin(self, epoch, logs=None):
lr = float(K.backend.get_value(self.model.optimizer.lr))
if self.schedule != None:
lr = self.schedule(epoch,lr)
else:
if epoch != 0:
lr = lr*self.alpha
K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr))
if self.verbose > 0:
print(f"Current learning rate is {lr}")
def scheduler(epoch, lr):
if epoch < 10:
return lr
else:
return lr * tf.math.exp(-0.1)
class Posenet:
def __init__(self,dataset:datasets):
self.dataset = dataset
self.build()
def build(self):
self.poseModel = model(self.dataset.dim)
self.poseModel.summary()
self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
self.callbacks = [decayCallbackScheduler]
print("************************loading the model weights***********************************")
self.poseModel.load_weights("model.h5")
except:
def train_fit(self):
self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
self.poseModel.fit(self.dataset.traindata,
validation_data=self.dataset.testdata,
epochs=self.dataset.epochs,
callbacks=[self.decayCallback],
verbose=1)
def train_gradient(self):
for step in range(self.dataset.epochs):
loss = 0
val_loss = 0
lr = float(self.optm.lr)
tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
for (x1,x2),y in self.dataset.traindata:
with tf.GradientTape() as tape:
prediction = self.poseModel([x1,x2])
# y = tf.cast(y,dtype=prediction.dtype)
loss = loss_fn(y,prediction)
gradients = tape.gradient(loss,self.poseModel.trainable_variables)
self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))
for (x1,x2),y in self.dataset.testdata:
prediction = self.poseModel([x1,x2])
val_loss = loss_fn(y,prediction)
tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss),lr,val_loss))
K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))
def save_model(self):
利用 save 函数来保存,可以保存为h5文件,也可以保存为文件夹的形式,推荐保存第二种,再使用tf2onnx转onnx
>>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
self.poseModel.save("model.h5")
self.poseModel.save(self.dataset.model_path)
# self.poseModel.save_weights("model.h5") #只保存权重,没有保存结构
# tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了
if __name__ == "__main__":
dataset = datasets("images")
posenet = Posenet(dataset)
posenet.train_fit()
# posenet.train_gradient() #利用 apply_gradient的方式训练
posenet.save_model()
import argparse
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
import cv2 as cv
import os
import time
import sys
import random
from tensorflow.keras import optimizers
from tensorflow.keras import callbacks
from tensorflow.python.keras.saving.save import save_model
class datasets:
def __init__(self, datasetsPath:str):
self.dataPath = datasetsPath
self.dim_w = 512
self.dim_h = 512
self.epochs = 200
self.batch_size = 8
self.train_percent = 0.92
self.learning_rate = 2e-4
self.model_path = 'kerasTempModel/'
self.posetxt = os.path.join(self.dataPath,'pose.txt')
self.GetTheImagesAndPose()
self.buildTrainData()
def GetTheImagesAndPose(self):
self.poselist = []
with open(self.posetxt,'r') as f:
for line in f.readlines():
line = line.strip()
line = line.split(' ')
line.remove(line[0])
self.poselist.append(line)
# im1 im2 tx,ty,tz,roll,pitch,yaw
#打乱数据集
length = np.shape(self.poselist)[0]
train_num =int(length * self.train_percent)
test_num = length - train_num
randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
self.train_pose_list = randomPoses[0:train_num,:]
self.test_pose_list = randomPoses[train_num:length+1,:]
print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")
def load_image(self,index:tf.Tensor):
img_ref= img_ref = tf.io.read_file(index[0])
img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
#img = tf.reshape(img,[self.dim,self.dim,3])
img_ref = tf.image.resize(img_ref,(self.dim_w,self.dim_h))/255.0
img_ref = tf.cast(img_ref,tf.float32)
img_cur = img_cur = tf.io.read_file(index[1])
img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
img_cur = tf.image.resize(img_cur,(self.dim_w,self.dim_h))/255.0
#img = tf.reshape(img,[self.dim,self.dim,3])
img_cur = tf.cast(img_cur,tf.float32)
pose = tf.strings.to_number(index[2:8],tf.float32)
return (img_ref,img_cur),(pose)
def buildTrainData(self):
for example:\\
>>> poses = dataset.y_train.take(20)\\
>>> imgs = dataset.x1_train.take(40)\\
>>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
>>> imgs = dataset.x2_train.take(40)\\
>>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
>>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
.map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
.shuffle(500)\
.repeat(10)\
.batch(self.batch_size) \
.prefetch(tf.data.experimental.AUTOTUNE)#.cache()
self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
.map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
.shuffle(500)\
.repeat(10)\
.batch(self.batch_size) \
.prefetch(tf.data.experimental.AUTOTUNE)
#-------resnet 34-------------
def conv_block(inputs,
neuron_num,
kernel_size,
use_bias,
padding= 'same',
strides= (1, 1),
with_conv_short_cut = False):
conv1 = K.layers.Conv2D(
neuron_num,
kernel_size = kernel_size,
activation= 'relu',
strides= strides,
use_bias= use_bias,
padding= padding
)(inputs)
conv1 = K.layers.BatchNormalization(axis = 1)(conv1)
conv2 = K.layers.Conv2D(
neuron_num,
kernel_size= kernel_size,
activation= 'relu',
use_bias= use_bias,
padding= padding)(conv1)
conv2 = K.layers.BatchNormalization(axis = 1)(conv2)
if with_conv_short_cut:
inputs = K.layers.Conv2D(
neuron_num,
kernel_size= kernel_size,
strides= strides,
use_bias= use_bias,
padding= padding
)(inputs)
return K.layers.add([inputs, conv2])
else:
return K.layers.add([inputs, conv2])
def ResNet34(inputs,namescope = ""):
x = K.layers.ZeroPadding2D((3, 3))(inputs)
# Define the converlutional block 1
x = K.layers.Conv2D(64, kernel_size= (7, 7), strides= (2, 2), padding= 'valid')(x)
x = K.layers.BatchNormalization(axis= 1)(x)
x = K.layers.MaxPooling2D(pool_size= (3, 3), strides= (2, 2), padding= 'same')(x)
# Define the converlutional block 2
x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
# Define the converlutional block 3
x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
# Define the converlutional block 4
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
# Define the converltional block 5
x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
x = K.layers.AveragePooling2D(pool_size=(7, 7))(x)
return x
def model(dim_w,dim_h):
First = K.layers.Input(shape=(dim_w,dim_h,3),name="input1")
Second = K.layers.Input(shape=(dim_w,dim_h,3),name="input2")
# x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
x1 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(First)
x1 = K.layers.BatchNormalization()(x1)
x1 = K.layers.LeakyReLU()(x1)
# x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
# x1 = K.layers.BatchNormalization()(x1)
# x1 = K.layers.ReLU()(x1)
x1 = ResNet34(x1,"x1")
# x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
x2 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(Second)
x2 = K.layers.BatchNormalization()(x2)
x2 = K.layers.LeakyReLU()(x2)
# x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
# x2 = K.layers.BatchNormalization()(x2)
# x2 = K.layers.ReLU()(x2)
x2 = ResNet34(x2,"x2")
x = K.layers.concatenate([x1,x2])
x = K.layers.Flatten()(x)
x = K.layers.Dense(6,name='Output')(x)
poseModel = K.Model([First,Second],x)
return poseModel
def loss_fn(y_true,y_pre):
loss_value_translation = K.backend.square(y_true[-1,0:3]-y_pre[-1,0:3])
loss_value_rotation = 1/5.7*K.backend.square(y_true[-1,3:6]-y_pre[-1,3:6])
loss_value = K.backend.mean(loss_value_translation + loss_value_rotation)
# loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
# tf.print(y_pre)
return loss_value
class learningDecay(K.callbacks.Callback):
def __init__(self,schedule=None,alpha=1,verbose=0):
super().__init__()
self.schedule = schedule
self.verbose = verbose
self.alpha = alpha
def on_epoch_begin(self, epoch, logs=None):
lr = float(K.backend.get_value(self.model.optimizer.lr))
if self.schedule != None:
lr = self.schedule(epoch,lr)
else:
if epoch >= 30:
lr = lr*self.alpha
K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr))
if self.verbose > 0:
print(f"Current learning rate is {lr}")
#save the model
if epoch % 20 == 0 and epoch != 0:
self.model.save("model.h5")
def scheduler(epoch, lr):
if epoch < 10:
return lr
else:
return lr * tf.math.exp(-0.1)
class Posenet:
def __init__(self,dataset:datasets):
self.dataset = dataset
self.build()
def build(self):
self.poseModel = model(self.dataset.dim_w,self.dataset.dim_h)
self.poseModel.summary()
self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
# self.optm = K.optimizers.Adam(1e-4)
self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
self.callbacks = [decayCallbackScheduler]
print("************************loading the model weights***********************************")
self.poseModel.load_weights("model.h5")
except:
def train_fit(self):
self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
self.poseModel.fit(self.dataset.traindata,
validation_data=self.dataset.testdata,
epochs=self.dataset.epochs,
callbacks=[self.decayCallback],
verbose=1)
def train_gradient(self):
for step in range(self.dataset.epochs):
loss = 0
val_loss = 0
index = 0
lr = float(self.optm.lr)
tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
for (x1,x2),y in self.dataset.traindata:
with tf.GradientTape() as tape:
prediction = self.poseModel([x1,x2])
# y = tf.cast(y,dtype=prediction.dtype)
loss = loss + loss_fn(y,prediction)
gradients = tape.gradient(loss,self.poseModel.trainable_variables)
self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))
index = index + 1
sys.stdout.write('--------train loss is %.5f-----'%(loss/float(index)))
sys.stdout.write('\r')
sys.stdout.flush()
index_val = 0
for (x1,x2),y in self.dataset.testdata:
prediction = self.poseModel([x1,x2])
val_loss = val_loss + loss_fn(y,prediction)
index_val = index_val + 1
tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss/float(index)),lr,val_loss/float(index_val)))
K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))
if step%40==0:
self.save_model()
def save_model(self):
利用 save 函数来保存,可以保存为h5文件,也可以保存为文件夹的形式,推荐保存第二种,再使用tf2onnx转onnx
>>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
self.poseModel.save("model.h5")
self.poseModel.save(self.dataset.model_path)
# self.poseModel.save_weights("model.h5") #只保存权重,没有保存结构
# tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了
def test(dataset):
im1 = cv.imread("imagesNDI/0.jpg")
im1 = cv.resize(im1,(512,512))
im1 = np.array(im1,np.float).reshape((1,512,512,3))/255.0
im2 = cv.imread("imagesNDI/1.jpg")
im2 = cv.resize(im2,(512,512))
im2 = np.array(im2,np.float).reshape((1,512,512,3))/255.0
posemodel = K.models.load_model(dataset.model_path,compile=False)
pose = posemodel([im1,im2])
print(np.array(pose))
if __name__ == "__main__":
dataset = datasets("images")
posenet = Posenet(dataset)
posenet.train_fit()
# posenet.train_gradient() #利用 apply_gradient的方式训练
posenet.save_model()
test(dataset)
#******onnxruntime*****
set(ONNXRUNTIME_ROOT_PATH /path to your onnxruntime-master)
set(ONNXRUNTIME_INCLUDE_DIRS ${ONNXRUNTIME_ROOT_PATH}/include/onnxruntime
${ONNXRUNTIME_ROOT_PATH}/onnxruntime
${ONNXRUNTIME_ROOT_PATH}/include/onnxruntime/core/session/)
set(ONNXRUNTIME_LIB ${ONNXRUNTIME_ROOT_PATH}/build/Linux/Release/libonnxruntime.so)
C++ main.cpp中
头文件:
#include <core/session/onnxruntime_cxx_api.h>
#include <core/providers/cuda/cuda_provider_factory.h>
#include <core/session/onnxruntime_c_api.h>
#include <core/providers/tensorrt/tensorrt_provider_factory.h>
三、模型推理流程
总体来看,整个ONNXRuntime的运行可以分为三个阶段:
- Session构造;
- 模型加载与初始化;
- 运行;
1、第1阶段:Session构造
构造阶段即创建一个InferenceSession对象。在python前端构建Session对象时,python端会通过http://onnxruntime_pybind_state.cc调用C++中的InferenceSession类构造函数,得到一个InferenceSession对象。
InferenceSession构造阶段会进行各个成员的初始化,成员包括负责OpKernel管理的KernelRegistryManager对象,持有Session配置信息的SessionOptions对象,负责图分割的GraphTransformerManager,负责log管理的LoggingManager等。当然,这个时候InferenceSession就是一个空壳子,只完成了对成员对象的初始构建。
2、第2阶段:模型加载与初始化
在完成InferenceSession对象的构造后,会将onnx模型加载到InferenceSession中并进行进一步的初始化。
2.1. 模型加载
模型加载时,会在C++后端会调用对应的Load()函数,InferenceSession一共提供了8种Load函数。包读从url,ModelProto,void* model data,model istream等读取ModelProto。InferenceSession会对ModelProto进行解析然后持有其对应的Model成员。
2.2. Providers注册
在Load函数结束后,InferenceSession会调用两个函数:RegisterExecutionProviders()和sess->Initialize();
RegisterExecutionProviders函数会完成ExecutionProvider的注册工作。这里解释一下ExecutionProvider,ONNXRuntime用Provider表示不同的运行设备比如CUDAProvider等。目前ONNXRuntimev1.0支持了包括CPU,CUDA,TensorRT,MKL等七种Providers。通过调用sess->RegisterExecutionProvider()函数,InferenceSession通过一个list持有当前运行环境中支持的ExecutionProviders。
2.3. InferenceSession初始化
即sess->Initialize(),这时InferenceSession会根据自身持有的model和execution providers进行进一步的初始化(在第一阶段Session构造时仅仅持有了空壳子成员变量)。该步骤是InferenceSession初始化的核心,一系列核心操作如内存分配,model partition,kernel注册等都会在这个阶段完成。
- 首先,session会根据level注册 graph optimization transformers,并通过GraphTransformerManager成员进行持有。
- 接下来session会进行OpKernel注册,OpKernel即定义的各个node对应在不同运行设备上的计算逻辑。这个过程会将持有的各个ExecutionProvider上定义的所有node对应的Kernel注册到session中,session通过KernelRegistryManager成员进行持有和管理。
- 然后session会对Graph进行图变换,包括插入copy节点,cast节点等。
- 接下来是model partition,也就是根运行设备对graph进行切分,决定每个node运行在哪个provider上。
- 最后,为每个node创建ExecutePlan,运行计划主要包含了各个op的执行顺序,内存申请管理,内存复用管理等操作。
3、第3阶段:模型运行
模型运行即InferenceSession每次读入一个batch的数据并进行计算得到模型的最终输出。然而其实绝大多数的工作早已经在InferenceSession初始化阶段完成。细看下源码就会发现run阶段主要是顺序调用各个node的对应OpKernel进行计算。
四、模型的部署
和其他所有主流框架相同,ONNXRuntime最常用的语言是python,而实际负责执行框架运行的则是C++。
下面就是C++通过onnxruntime对.onnx模型的使用,参考官方样例和常见问题写的模型多输入多输出的情况,部分参数可以参考样例或者查官方API文档。
1. 模型的初始化设置
//模型位置
string model_path = "../model.onnx";
//初始化设置ONNXRUNTIME 的环境
Ort::Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "PoseEstimate");
Ort::SessionOptions session_options;
//TensorRT加速开启,CUDA加速开启
OrtSessionOptionsAppendExecutionProvider_Tensorrt(session_options, 0); //tensorRT
OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
Ort::AllocatorWithDefaultOptions allocator;
//加载ONNX模型
Ort::Session session(env, model_path.c_str(), session_options);
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
打印模型信息:printModelInfo
函数
void printModelInfo(Ort::Session &session, Ort::AllocatorWithDefaultOptions &allocator)
//输出模型输入节点的数量
size_t num_input_nodes = session.GetInputCount();
size_t num_output_nodes = session.GetOutputCount();
cout<<"Number of input node is:"<<num_input_nodes<<endl;
cout<<"Number of output node is:"<<num_output_nodes<<endl;
//获取输入输出维度
for(auto i = 0; i<num_input_nodes;i++)
std::vector<int64_t> input_dims = session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
cout<<endl<<"input "<<i<<" dim is: ";
for(auto j=0; j<input_dims.size();j++)
cout<<input_dims[j]<<" ";
for(auto i = 0; i<num_output_nodes;i++)
std::vector<int64_t> output_dims = session.GetOutputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
cout<<endl<<"output "<<i<<" dim is: ";
for(auto j=0; j<output_dims.size();j++)
cout<<output_dims[j]<<" ";
//输入输出的节点名
cout<<endl;//换行输出
for(auto i = 0; i<num_input_nodes;i++)
cout<<"The input op-name "<<i<<" is:"<<session.GetInputName(i, allocator)<<endl;
for(auto i = 0; i<num_output_nodes;i++)
cout<<"The output op-name "<<i<<" is:"<<session.GetOutputName(i, allocator)<<endl;
//input_dims_2[0] = input_dims_1[0] = output_dims[0] = 1;//batch size = 1
函数应用:
//打印模型的信息
printModelInfo(session,allocator);
输出结果:
Number of input node is:2
Number of output node is:1
input 0 dim is: -1 512 512 3
input 1 dim is: -1 512 512 3
output 0 dim is: -1 6
The input op-name 0 is:input1
The input op-name 1 is:input2
The output op-name 0 is:Output
如果事先不知道网络,通过打印信息这时候就可以定义全局变量:
//输入网络的维度
static constexpr const int width = 512;
static constexpr const int height = 512;
static constexpr const int channel = 3;
std::array<int64_t, 4> input_shape_{ 1,height, width,channel};
2、构建推理
2.1 构建推理函数computPoseDNN()步骤
- 对应用Opencv输入的Mat图像进行resize
Mat Input_1,Input_2;
resize(img_1,Input_1,Size(512,512));
resize(img_2,Input_2,Size(512,512));
- 指定input和output的节点名,当然也可以定义在全局变量中,这里为了方便置入函数中
std::vector<const char*> input_node_names = {"input1","input2"};
std::vector<const char*> output_node_names = {"Output"};
-
分配image_ref和image_cur的内存,用指针数组存储,这里长度为 512 * 512 * 3,因为不能直接把Mat矩阵输入,所以需要数组来存储图像数据,然后再转ONNXRUNTIME专有的tensor类型即可:
std::array<float, width * height *channel> input_image_1{};
std::array<float, width * height *channel> input_image_2{};
float* input_1 = input_image_1.data();
float* input_2 = input_image_2.data();
这里float类型根据自己网络需要来,也有可能是double, 可以利用下面的代码输出网络类型:
cout<<session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetElementType();
上面的c++代码会输出索引,对应下面的数据类型:
typedef enum ONNXTensorElementDataType {
ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED,
ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, // maps to c type float
ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8, // maps to c type uint8_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8, // maps to c type int8_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16, // maps to c type uint16_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16, // maps to c type int16_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32, // maps to c type int32_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64, // maps to c type int64_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_STRING, // maps to c++ type std::string
ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL,
ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16,
ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE, // maps to c type double
ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32, // maps to c type uint32_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64, // maps to c type uint64_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_COMPLEX64, // complex with float32 real and imaginary components
ONNX_TENSOR_ELEMENT_DATA_TYPE_COMPLEX128, // complex with float64 real and imaginary components
ONNX_TENSOR_ELEMENT_DATA_TYPE_BFLOAT16 // Non-IEEE floating-point format based on IEEE754 single-precision
} ONNXTensorElementDataType;
例如,如果cout 输出 1,那么网络输出类型就是 float;
- 利用循环对float的数组进行赋值:这里可以是 CHW 或者 HWC 的格式:你在训练中很可能对数据进行了归一化处理,比如除以了255.0,这里数据还原就需要除以255.0
for (int i = 0; i < Input_1.rows; i++) {
for (int j = 0; j < Input_1.cols; j++) {
for (int c = 0; c < 3; c++)
//NHWC 格式
if(c==0)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
if(c==1)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
if(c==2)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
//NCHW 格式
// if (c == 0)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
// if (c == 1)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
// if (c == 2)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;
for (int i = 0; i < Input_2.rows; i++) {
for (int j = 0; j < Input_2.cols; j++) {
for (int c = 0; c < 3; c++)
//NHWC 格式
if(c==0)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
if(c==1)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
if(c==2)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
- 这里由于不同网络可能有多个输入节点和多个输出节点,这里需要用std::vector来定义Ort 的tensor;利用两个输入数据创建两个tensor:
std::vector<Ort::Value> input_tensors;
input_tensors.push_back(Ort::Value::CreateTensor<float>(
memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
input_tensors.push_back(Ort::Value::CreateTensor<float>(
memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));
其中 input_shape_就是输入的维度: std::array<int64_t, 4> input_shape_{ 1,512, 512,3};
- 前向推理:同样定义输出的tensor也为 vector,保证通用性
std::vector<Ort::Value> output_tensors;
output_tensors = session.Run(Ort::RunOptions { nullptr },
input_node_names.data(), //输入节点名
input_tensors.data(), //input tensors
input_tensors.size(), //2
output_node_names.data(), //输出节点名
output_node_names.size()); //1
- 输出结果获取:由于本例输出只有一个维度,所以只需要
output_tensors[0]
即可取出结果: float* output = output_tensors[0].GetTensorMutableData<float>();
之后再进行位姿重构:
Eigen::Vector3d t(output[0],output[1],output[2]);
Eigen::Vector3d r(output[3],output[4],output[5]);
// 初始化旋转向量
Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
// 转换为旋转矩阵,x y z的顺式
Eigen::Matrix3d R_matrix_xyz = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();
return Sophus::SE3(R_matrix_xyz,t);
2.2 函数具体代码
Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session,Ort::MemoryInfo &memory_info)
Mat Input_1,Input_2;
resize(img_1,Input_1,Size(512,512));
resize(img_2,Input_2,Size(512,512));
std::vector<const char*> input_node_names = {"input1","input2"};
std::vector<const char*> output_node_names = {"Output"};
//将图像存储到数组中,BGR--->RGB
std::array<float, width * height *channel> input_image_1{};
std::array<float, width * height *channel> input_image_2{};
float* input_1 = input_image_1.data();
float* input_2 = input_image_2.data();
for (int i = 0; i < Input_1.rows; i++) {
for (int j = 0; j < Input_1.cols; j++) {
for (int c = 0; c < 3; c++)
//NHWC 格式
if(c==0)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
if(c==1)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
if(c==2)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
//NCHW 格式
// if (c == 0)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
// if (c == 1)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
// if (c == 2)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;
for (int i = 0; i < Input_2.rows; i++) {
for (int j = 0; j < Input_2.cols; j++) {
for (int c = 0; c < 3; c++)
//NHWC 格式
if(c==0)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
if(c==1)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
if(c==2)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
std::vector<Ort::Value> input_tensors;
input_tensors.push_back(Ort::Value::CreateTensor<float>(
memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
input_tensors.push_back(Ort::Value::CreateTensor<float>(
memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));
std::vector<Ort::Value> output_tensors;
output_tensors = session.Run(Ort::RunOptions { nullptr },
input_node_names.data(), //输入节点名
input_tensors.data(), //input tensors
input_tensors.size(), //2
output_node_names.data(), //输出节点名
output_node_names.size()); //1
// cout<<output_tensors.size()<<endl;//输出的维度
float* output = output_tensors[0].GetTensorMutableData<float>();
Eigen::Vector3d t(output[0],output[1],output[2]);
Eigen::Vector3d r(output[3],output[4],output[5]);
// 初始化旋转向量
Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
// 转换为旋转矩阵
Eigen::Matrix3d R_matrix_xyz = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();
return Sophus::SE3(R_matrix_xyz,t);
五、示例应用
#include <core/session/onnxruntime_cxx_api.h>
#include <core/providers/cuda/cuda_provider_factory.h>
#include <core/session/onnxruntime_c_api.h>
#include <core/providers/tensorrt/tensorrt_provider_factory.h>
#include <opencv2/opencv.hpp>
#include <sophus/se3.h>
#include <iostream>
Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session, Ort::MemoryInfo &memory_info);
//输入网络的维度
static constexpr const int width = 512;
static constexpr const int height = 512;
static constexpr const int channel = 3;
std::array<int64_t, 4> input_shape_{ 1,height, width,channel};
using namespace cv;
using namespace std;
int main()
//模型位置
string model_path = "../model.onnx";
Ort::Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "PoseEstimate");
Ort::SessionOptions session_options;
//CUDA加速开启
OrtSessionOptionsAppendExecutionProvider_Tensorrt(session_options, 0); //tensorRT
OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
Ort::AllocatorWithDefaultOptions allocator;
//加载ONNX模型
Ort::Session session(env, model_path.c_str(), session_options);
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
//打印模型的信息
printModelInfo(session,allocator);
Mat img_1 = imread("/path_to_your_img1",IMREAD_COLOR);
Mat img_2 = imread("/path_to_your_img2",IMREAD_COLOR);
Sophus::SE3 pose = computePoseDNN(img_1,img_2,session,memory_info);
Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session,Ort::MemoryInfo &memory_info)
Mat Input_1,Input_2;
resize(img_1,Input_1,Size(512,512));
resize(img_2,Input_2,Size(512,512));
std::vector<const char*> input_node_names = {"input1","input2"};
std::vector<const char*> output_node_names = {"Output"};
//将图像存储到uchar数组中,BGR--->RGB
std::array<float, width * height *channel> input_image_1{};
std::array<float, width * height *channel> input_image_2{};
float* input_1 = input_image_1.data();
float* input_2 = input_image_2.data();
for (int i = 0; i < Input_1.rows; i++) {
for (int j = 0; j < Input_1.cols; j++) {
for (int c = 0; c < 3; c++)
//NHWC 格式
if(c==0)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
if(c==1)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
if(c==2)
input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
//NCHW 格式
// if (c == 0)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
// if (c == 1)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
// if (c == 2)
// input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;
for (int i = 0; i < Input_2.rows; i++) {
for (int j = 0; j < Input_2.cols; j++) {
for (int c = 0; c < 3; c++)
//NHWC 格式
if(c==0)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
if(c==1)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
if(c==2)
input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
std::vector<Ort::Value> input_tensors;
input_tensors.push_back(Ort::Value::CreateTensor<float>(
memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
input_tensors.push_back(Ort::Value::CreateTensor<float>(
memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));
std::vector<Ort::Value> output_tensors;
output_tensors = session.Run(Ort::RunOptions { nullptr },
input_node_names.data(), //输入节点名
input_tensors.data(), //input tensors
input_tensors.size(), //2
output_node_names.data(), //输出节点名
output_node_names.size()); //1
// cout<<output_tensors.size()<<endl;//输出的维度
float* output = output_tensors[0].GetTensorMutableData<float>();
Eigen::Vector3d t(output[0],output[1],output[2]);
Eigen::Vector3d r(output[3],output[4],output[5]);
// 初始化旋转向量,绕z轴旋转,y轴,x轴;
Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
// 转换为旋转矩阵
Eigen::Matrix3d R_matrix_xyz = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();
return Sophus::SE3(R_matrix_xyz,t);
void printModelInfo(Ort::Session &session, Ort::AllocatorWithDefaultOptions &allocator)
//输出模型输入节点的数量
size_t num_input_nodes = session.GetInputCount();
size_t num_output_nodes = session.GetOutputCount();
cout<<"Number of input node is:"<<num_input_nodes<<endl;
cout<<"Number of output node is:"<<num_output_nodes<<endl;
//获取输入输出维度
for(auto i = 0; i<num_input_nodes;i++)
std::vector<int64_t> input_dims = session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
cout<<endl<<"input "<<i<<" dim is: ";
for(auto j=0; j<input_dims.size();j++)
cout<<input_dims[j]<<" ";
for(auto i = 0; i<num_output_nodes;i++)
std::vector<int64_t> output_dims = session.GetOutputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
cout<<endl<<"output "<<i<<" dim is: ";
for(auto j=0; j<output_dims.size();j++)
cout<<output_dims[j]<<" ";
//输入输出的节点名
cout<<endl;//换行输出
for(auto i = 0; i<num_input_nodes;i++)
cout<<"The input op-name "<<i<<" is:"<<session.GetInputName(i, allocator)<<endl;
for(auto i = 0; i<num_output_nodes;i++)
cout<<"The output op-name "<<i<<" is:"<<session.GetOutputName(i, allocator)<<endl;
C++ 上用 ONNXruntime 部署自己的模型_机器人学渣的博客-CSDN博客_c++ onnxruntime
tensorflow keras 搭建相机位姿估计网络--例_机器人学渣的博客-CSDN博客_位姿估计网络
C++ 上用 ONNXruntime 部署自己的模型_机器人学渣的博客-CSDN博客
ONNXRuntime学习笔记(四) - Lee-zq - 博客园
onnxruntime c++ 代码搜集_落花逐流水的博客-CSDN博客
onnxruntime调用AI模型的python和C++编程_Arnold-FY-Chen的博客-CSDN博客
微软联合Facebook等在2017年搞了个深度学习以及机器学习模型的格式标准–ONNX,旨在将所有模型格式统一为一致,更方便地实现模型部署。现在大多数的深度学习框架都支持ONNX模型转出并提供相应的导出接口。ONNXRuntime(Open Neural Network Exchange)是微软推出的一款针对ONNX模型格式的推理框架,用户可以非常便利的用其运行一个onnx模型。ONNXRuntime支持多种运行后端包括CPU,GPU,TensorRT,DML等。
做模型部署的时候需要将模型转换为onnx,转换好之后需要将图像传入验证对比原来的pth看输出参数是否一致。我的模型输出有三部分,直接贴出代码了,注释那部分可以实现指定输出部分。
import onnxruntime as ort
ort_session = ort.InferenceSession('./RetinaFace.onnx') #加载模型并且初始化
input_name = ort_session.get_inputs()[0].name
# outputs_1 = ort...
本代码是根据yolov5导出的onnx模型,通过c++部署;
本代码是可运行代码,但是项目目录下的包含目录需要根据自己的安装路径进行更改;
代码中的属性列表中包含opencv目录,需要根据自己下载的路径进行修改;
代码中的onnxruntime路径需要根据自己的路径配置,配置教程已经上传到博客中;
如果调试过程中有其他疑问,可以先看看我的这篇博客:pytorch导出yolov5 onnx模型用vs2019 c++推理保姆级教程
ONNXRuntime是微软推出的一款推理框架,用户可以非常便利的用其运行一个onnx模型。ONNXRuntime支持多种运行后端包括CPU,GPU,TensorRT,DML等。可以说ONNXRuntime是对ONNX模型最原生的支持。
虽然大家用ONNX时更多的是作为一个中间表示,从pytorch转到onnx后直接喂到TensorRT或MNN等各种后端框架,但这并不能否认ONNXRuntime是一款非常优秀的推理框架。而且由于其自身只包含推理功能(最新的ONNXRuntime甚至已经可以训练),通过阅读其
onnxruntime的c++使用
利用onnx和onnxruntime实现pytorch深度框架使用C++推理进行服务器部署,模型推理的性能是比python快很多的
python:
pytorch == 1.6.0
onnx == 1.7.0
onnxruntime == 1.3.0
c++:
onnxruntime-linux-x64-1.4.0
首先,利用pytorch自带的torch.onnx模块导出 .onnx模型文件,具体查看该部分pytorch官方文档,主要流程如下:
def onnx2mnn(onnx_name,mnnname):
cmd_dtr="python -m MNN.tools.mnnconvert -f ONNX --modelFile "+onnx_name+" --MNNModel "+mnnname+" --bizCode MNN"
将得到的模型转化为onnx模型,加载到c++中运行,来完成模型的部署,下载并安装onnxruntime;
CMakeLists.txt:
cmake_minimum_required(VERSION 2.8)
project(test)
#使用clang++编译器
set(CMAKE_CXX_COMPILER clang++)
set(CMAKE_BUILD_TYPE "Release")
set(CMAKE_INCLUDE_CURRENT_DIR ON)
#find the opencv and
文章目录0. onnx模型准备以及测试图1. c++使用onnxruntime进行推理2. c++使用opencv进行推理
0. onnx模型准备以及测试图
参考:https://blog.csdn.net/qq_44747572/article/details/120820964?spm=1001.2014.3001.5501
1. c++使用onnxruntime进行推理
#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.h
利用C++ ONNXruntime部署自己的模型,这里用Keras搭建好的一个网络模型来举例,转换为onnx的文件,在C++上进行部署,另外可以利用tensorRT加速。
目录一、模型的准备二、配置ONNXruntime三、模型的部署1. 模型的初始化设置2. 构建推理构建推理函数computPoseDNN()步骤:函数具体代码:四、应用参考
一、模型的准备
搭建网络模型训练:
tensorflow keras 搭建相机位姿估计网络–例
网络的输入输出为:
网络的输入: [image_ref , ima
1. 将模型转换成 ONNX 格式
使用 PyTorch 软件包将 YOLOv5 Segmentation 模型训练并导出为 ONNX 格式。可以使用以下 Python 代码:
import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model.eval()
# Export the model to ONNX format
torch.onnx.export(model, # PyTorch model
torch.rand(1, 3, 640, 640), # Input tensor shape
"yolov5s.onnx", # Output ONNX model name
export_params=True) # Export weights and biases
2. 使用 ONNX Runtime 部署模型
使用 ONNX Runtime C API,可以部署 ONNX 模型。可以使用以下 C++ 代码:
#include <stdio.h>
#include <assert.h>
#include <fstream>
#include <iostream>
#include <vector>
#include "onnxruntime_c_api.h"
int main()
OrtEnv* env;
OrtCreateEnv(ORT_LOGGING_LEVEL_WARNING, "test", &env);
OrtSession* session;
OrtStatus* status;
const char* model_path = "yolov5s.onnx";
OrtSessionOptions* session_options;
OrtCreateSessionOptions(&session_options);
status = OrtSessionOptionsAppendExecutionProvider_CPU(session_options, ORT_ENABLE_ALL);
status = OrtCreateSession(env, model_path, session_options, &session);
OrtMemoryInfo* memory_info;
OrtCreateCpuMemoryInfo(OrtArenaAllocator, OrtMemTypeDefault, &memory_info);
// Prepare input
OrtValue* input_tensor = NULL;
size_t input_size = 1 * 3 * 640 * 640;
void* input_data = malloc(input_size);
// TODO: Populate input_data with image data in BGR format
status = OrtCreateTensorWithDataAsOrtValue(memory_info, input_data, input_size, {1, 3, 640, 640}, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, &input_tensor);
// Prepare output
OrtValue* output_tensor = NULL;
const char* output_name = "output"; // TODO: Replace with the actual output name of the YOLOv5 Segmentation model
status = OrtSessionGetOutputCount(session, &output_count);
std::vector<const char*> output_names(output_count);
std::vector<int64_t> output_shapes(output_count);
for (int i = 0; i < output_count; i++) {
char* output_name_temp;
status = OrtSessionGetOutputName(session, i, memory_info, &output_name_temp);
assert(status == NULL);
output_names[i] = output_name_temp;
OrtTensorTypeAndShapeInfo* output_info;
OrtSessionGetOutputTypeInfo(session, i, &output_info);
assert(status == NULL);
size_t num_dims;
OrtTensorTypeAndShapeInfoGetShape(output_info, &output_shapes[i], 1, &num_dims);
assert(status == NULL);
OrtReleaseTensorTypeAndShapeInfo(output_info);
status = OrtSessionRun(session, NULL, &input_names[0], &input_tensors[0], 1, &output_names[0], 1, &output_tensor);
assert(status == NULL);
// TODO: Process output_tensor
// Clean up
OrtReleaseValue(input_tensor);
OrtReleaseValue(output_tensor);
OrtReleaseSession(session);
OrtReleaseSessionOptions(session_options);
OrtReleaseMemoryInfo(memory_info);
OrtReleaseEnv(env);
free(input_data);
return 0;
3. 处理输出张量
YOLOv5 Segmentation 模型的输出张量是一个 4 维的张量,形状为 `[batch_size, num_classes, height, width]`,其中 `batch_size` 表示批大小,`num_classes` 表示类别数量,`height` 和 `width` 表示图像中每个像素的标签。可以使用以下 C++ 代码来解析输出张量:
OrtStatus* status;
float* output_data = OrtGetFloatPtr(output_tensor, &num_elements);
status = OrtGetValueCount(output_tensor, &output_count);
assert(status == NULL);
const int num_classes = output_shapes[1];
const int height = output_shapes[2];
const int width = output_shapes[3];
std::vector<int> predictions(num_elements);
for (int i = 0; i < num_elements; i++) {
predictions[i] = (int) (output_data[i] * num_classes);
// TODO: Process predictions
4. 可视化分割结果
可以使用 OpenCV C++ 库来可视化分割结果,代码如下:
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
// TODO: Replace with the actual image path
const char* image_path = "test.jpg";
// TODO: Replace with the actual output post-processing code
std::vector<int> predictions = postprocess_output(output_data, output_shapes);
cv::Mat image = cv::imread(image_path);
cv::Mat seg_image(height, width, CV_8UC3, cv::Scalar(0, 0, 0));
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
int prediction = predictions[y * width + x];
if (prediction == 0) {
seg_image.at<cv::Vec3b>(y, x) = cv::Vec3b(255, 255, 255); // Background
} else {
seg_image.at<cv::Vec3b>(y, x) = cv::Vec3b(0, 0, 255); // Object
cv::Mat result;
cv::addWeighted(image, 0.5, seg_image, 0.5, 0, result);
cv::imshow("Result", result);
cv::waitKey(0);
cv::destroyAllWindows();
Anaconda虚拟环境下更换python版本【不论升版本、降版本都使用conda install python命令】【注意:修改版本后原来使用pip安装的包会被删掉,无法使用】
21422
NLP-信息抽取:关系抽取【即:三元组抽取,主要用于抽取实体间的关系】【基于命名实体识别、分词、词性标注、依存句法分析、语义角色标注】【自定义模板/规则、监督学习(分类器)、半监督学习、无监督学习】
CSDN-Ada助手:
多亏了你这篇博客, 解决了问题: https://ask.csdn.net/questions/7979913, 请多输出高质量博客, 帮助更多的人
电气领域相关数据集(目标检测,分类图像数据及负荷预测),输电线路图像数据
今日上上签72:
兄弟,找到了没有
PyTorch中的model.zero_grad() 与 optimizer.zero_grad()
雨下听风.:
p.grad.detach_()是干什么的,和p.grad.zero_()连用目的是什么?
音频分类-数据集:AudioSet【Google发行的声音版ImageNet】
m0_59781092:
你好,这个GitHub网站我打开了里面没有音频数据集是为什么呀?