添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

微软联合Facebook等在2017年搞了个深度学习以及机器学习模型的格式标准–ONNX,旨在将所有模型格式统一为一致,更方便地实现模型部署。现在大多数的深度学习框架都支持ONNX模型转出并提供相应的导出接口。

ONNXRuntime(Open Neural Network Exchange)是微软推出的一款针对ONNX模型格式的推理框架,用户可以非常便利的用其运行一个onnx模型。ONNXRuntime支持多种运行后端包括CPU,GPU,TensorRT,DML等。可以说ONNXRuntime是对ONNX模型最原生的支持,只要掌握模型导出的相应操作,便能对将不同框架的模型进行部署,提高开发效率。

利用onnx和onnxruntime实现 pytorch 深度框架使用C++推理进行服务器部署,模型推理的性能是比python快很多的。

利用C++ ONNXruntime部署自己的模型,这里用 Keras 搭建好的一个网络模型来举例,转换为onnx的文件,在C++上进行部署,另外可以利用tensorRT加速。

Github地址: https://github.com/zouyuelin/SLAM_Learning_notes/tree/main/PoseEstimation

GitHub下载地址:

Releases · microsoft/onnxruntime · GitHub

下载的onnxruntime是直接编译好的库文件,直接放在自定义的文件夹中即可。在CMakeLists.txt中引入onnxruntime的头文件、库文件即可。

# 引入头文件
include_directories(......../onnxruntime/include)
# 引入库文件
link_directories(......../onnxruntime/lib)

一、模型的准备

1、Pytorch导出.onnx模型

首先,利用pytorch自带的 torch.onnx 模块导出 .onnx 模型文件,具体查看该部分 pytorch官方文档 ,主要流程如下:

import torch
checkpoint = torch.load(model_path)
model = ModelNet(params)
model.load_state_dict(checkpoint['model'])
model.eval()
input_x_1 = torch.randn(10,20)
input_x_2 = torch.randn(1,20,5)
output, mask = model(input_x_1, input_x_2)
torch.onnx.export(model,
                 (input_x_1, input_x_2),
                 'model.onnx',
                 input_names = ['input','input_mask'],
                 output_names = ['output','output_mask'],
                 opset_version=11,
                 verbose = True,
                 dynamic_axes={'input':{1,'seqlen'}, 'input_mask':{1:'seqlen',2:'time'},'output_mask':{0:'time'}})

torch.onnx.export参数在文档里面都有,opset_version对应的版本很重要,dynamic_axes是对输入和输出对应维度可以进行动态设置,不设置的话输入和输出的Tensor 的 shape是不能改变的,如果输入固定就不需要加。

导出的模型可否顺利使用可以先使用python进行检测

import onnxruntime as ort
import numpy as np
ort_session = ort.InferenceSession('model.onnx')
outputs = ort_session.run(None,{'input':np.random.randn(10,20),'input_mask':np.random.randn(1,20,5)})
# 由于设置了dynamic_axes,支持对应维度的变化
outputs = ort_session.run(None,{'input':np.random.randn(10,5),'input_mask':np.random.randn(1,26,2)})
# outputs 为 包含'output'和'output_mask'的list
import onnx
model = onnx.load('model.onnx')
onnx.checker.check_model(model)

如果没有异常代表导出的模型没有问题,目前torch.onnx.export只能对部分支持的Tensor操作进行识别,详情参考Supported operators,对于包括transformer等基本的模型都是没有问题的,如果出现ATen等问题,你就需要对模型不支持的Tensor操作进行改进,以免影响C++对该模型的使用。

2、Tensorflow Keras导出.onnx模型

搭建网络模型训练:
tensorflow keras 搭建相机位姿估计网络–例
网络的输入输出为:

网络的输入: [image_ref , image_cur]
网络的输出: [tx , ty , tz , roll , pitch , yaw]

训练的模型位置:kerasTempModel\,一定要用model.save()的方式,不能用model.save_model()
在onnxruntime调用需要onnx模型,这里需要将keras的模型转换为onnx模型;

安装转换的工具:

pip install tf2onnx

安装完后运行:

python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14

tip:这里设置 opset 版本为14 的优化效率目前亲测是最好的,推理速度比版本 11 、12更快。

运行完以后在终端最后会告诉你网络模型的输入和输出:

2022-01-21 15:48:00,766 - INFO - 
2022-01-21 15:48:00,766 - INFO - Successfully converted TensorFlow model kerasTempModel to ONNX
2022-01-21 15:48:00,766 - INFO - Model inputs: ['input1', 'input2']
2022-01-21 15:48:00,766 - INFO - Model outputs: ['Output']
2022-01-21 15:48:00,766 - INFO - ONNX model is saved at model.onnx

模型有两个输入,输入节点名分别为['input1', 'input2'],输出节点名为['Output']

当然也可以不用具体知道节点名,在onnxruntime中可以通过打印来查看模型的输入输出节点名。

2.1、数据集的处理

数据集的格式:

# image_ref  image_cur  tx  ty  tz  roll(x) pitch(y) yaw(z)
0 images/0.jpg images/1.jpg 0.000999509 -0.00102794 0.00987293 0.00473228 -0.0160252 -0.0222079 
1 images/1.jpg images/2.jpg -0.00544488 -0.00282174 0.00828871 -0.00271557 -0.00770117 -0.0195182 
2 images/2.jpg images/3.jpg -0.0074375 -0.00368121 0.0114751 -0.00721246 -0.0103843 -0.0171883 
3 images/3.jpg images/4.jpg -0.00238111 -0.00371362 0.0120466 -0.0081171 -0.0149111 -0.0198595 
4 images/4.jpg images/5.jpg 0.000965841 -0.00520437 0.0135452 -0.0141721 -0.0126401 -0.0182697 
5 images/5.jpg images/6.jpg -0.00295753 -0.00340146 0.0144557 -0.013633 -0.00463747 -0.0143332 

通过load_image映射数据集到 TF tensor

class datasets:
    def __init__(self, datasetsPath:str):
        self.dataPath = datasetsPath
        self.dim = 512
        self.epochs = 40
        self.batch_size = 8
        self.train_percent = 0.92
        self.learning_rate = 2e-4
        self.model_path = 'kerasTempModel/'
        self.posetxt = os.path.join(self.dataPath,'pose.txt') 
        self.GetTheImagesAndPose()
        self.buildTrainData()
    def GetTheImagesAndPose(self):
        self.poselist = []
        with open(self.posetxt,'r') as f:
            for line in f.readlines():
                line = line.strip()
                line = line.split(' ')
                line.remove(line[0])
                self.poselist.append(line)
                # im1 im2 tx,ty,tz,roll,pitch,yaw
        #打乱数据集
        length = np.shape(self.poselist)[0]
        train_num =int(length * self.train_percent) 
        test_num = length - train_num
        randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
        self.train_pose_list = randomPoses[0:train_num,:]
        self.test_pose_list = randomPoses[train_num:length+1,:]
        print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
        print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")
    def load_image(self,index:tf.Tensor):
        img_ref= img_ref = tf.io.read_file(index[0])
        img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
        img_ref = tf.image.resize(img_ref,(self.dim,self.dim))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_ref = tf.cast(img_ref,tf.float32)
        img_cur = img_cur = tf.io.read_file(index[1])
        img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
        img_cur = tf.image.resize(img_cur,(self.dim,self.dim))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_cur = tf.cast(img_cur,tf.float32)
        pose = tf.strings.to_number(index[2:8],tf.float32)
        return (img_ref,img_cur),(pose)
    def buildTrainData(self):
        for example:\\
        >>> poses = dataset.y_train.take(20)\\
        >>> imgs = dataset.x1_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> imgs = dataset.x2_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
        self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)#.cache() 
        self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)

2.2、网络模型的搭建

简单的模型

def model(dim): First = K.layers.Input(shape=(dim,dim,3),name="input1") Second = K.layers.Input(shape=(dim,dim,3),name="input2") x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First) x1 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x1) x1 = K.layers.BatchNormalization()(x1) x1 = K.layers.ReLU()(x1) x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1) x1 = K.layers.BatchNormalization()(x1) x1 = K.layers.ReLU()(x1) x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x1) x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second) x2 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x2) x2 = K.layers.BatchNormalization()(x2) x2 = K.layers.ReLU()(x2) x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2) x2 = K.layers.BatchNormalization()(x2) x2 = K.layers.ReLU()(x2) x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x2) x = K.layers.concatenate([x1,x2]) x = K.layers.Conv2D(256,kernel_size=(3,3), strides=1,padding='same', activation='relu')(x) x = K.layers.BatchNormalization()(x) x = K.layers.ReLU()(x) x = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x) x = K.layers.Conv2D(128,kernel_size=(3,3), strides=1,padding='same', activation='relu')(x) x = K.layers.BatchNormalization()(x) x = K.layers.ReLU()(x) x = K.layers.Flatten()(x) x = K.layers.Dense(1024)(x) x = K.layers.Dense(6,name='Output')(x) poseModel = K.Model([First,Second],x) return poseModel #损失函数 def loss_fn(y_true,y_pre): loss_value = K.backend.mean(K.backend.square(y_true-y_pre)) return loss_value # 学习率下降回调函数 class learningDecay(K.callbacks.Callback): def __init__(self,schedule=None,alpha=1,verbose=0): super().__init__() self.schedule = schedule self.verbose = verbose self.alpha = alpha def on_epoch_begin(self, epoch, logs=None): lr = float(K.backend.get_value(self.model.optimizer.lr)) if self.schedule != None: lr = self.schedule(epoch,lr) else: if epoch != 0: lr = lr*self.alpha K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr)) if self.verbose > 0: print(f"Current learning rate is {lr}") # 学习率计划 def scheduler(epoch, lr): if epoch < 10: return lr else: return lr * tf.math.exp(-0.1)

resnet34为主骨

#-------resnet 34-------------
def conv_block(inputs, 
        neuron_num, 
        kernel_size,  
        use_bias, 
        padding= 'same',
        strides= (1, 1),
        with_conv_short_cut = False):
    conv1 = K.layers.Conv2D(
        neuron_num,
        kernel_size = kernel_size,
        activation= 'relu',
        strides= strides,
        use_bias= use_bias,
        padding= padding
    )(inputs)
    conv1 = K.layers.BatchNormalization(axis = 1)(conv1)
    conv2 = K.layers.Conv2D(
        neuron_num,
        kernel_size= kernel_size,
        activation= 'relu',
        use_bias= use_bias,
        padding= padding)(conv1)
    conv2 = K.layers.BatchNormalization(axis = 1)(conv2)
    if with_conv_short_cut:
        inputs = K.layers.Conv2D(
            neuron_num, 
            kernel_size= kernel_size,
            strides= strides,
            use_bias= use_bias,
            padding= padding
            )(inputs)
        return K.layers.add([inputs, conv2])
    else:
        return K.layers.add([inputs, conv2])
def ResNet34(inputs,namescope = ""):
    x = K.layers.ZeroPadding2D((3, 3))(inputs)
    # Define the converlutional block 1
    x = K.layers.Conv2D(64, kernel_size= (7, 7), strides= (2, 2), padding= 'valid')(x)
    x = K.layers.BatchNormalization(axis= 1)(x)
    x = K.layers.MaxPooling2D(pool_size= (3, 3), strides= (2, 2), padding= 'same')(x)
    # Define the converlutional block 2
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    # Define the converlutional block 3
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
    # Define the converlutional block 4
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    # Define the converltional block 5
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
    x = K.layers.AveragePooling2D(pool_size=(7, 7))(x)
    return x
def model(dim_w,dim_h):
    First = K.layers.Input(shape=(dim_w,dim_h,3),name="input1")
    Second = K.layers.Input(shape=(dim_w,dim_h,3),name="input2")
    # x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
    x1 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(First)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.LeakyReLU()(x1)
    # x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
    # x1 = K.layers.BatchNormalization()(x1)
    # x1 = K.layers.ReLU()(x1)
    x1 = ResNet34(x1,"x1")
    # x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
    x2 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(Second)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.LeakyReLU()(x2)
    # x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
    # x2 = K.layers.BatchNormalization()(x2)
    # x2 = K.layers.ReLU()(x2)
    x2 = ResNet34(x2,"x2")
    x = K.layers.concatenate([x1,x2])
    x = K.layers.Flatten()(x)
    x = K.layers.Dense(6,name='Output')(x)
    poseModel = K.Model([First,Second],x)
    return poseModel
def loss_fn(y_true,y_pre):
    loss_value_translation = K.backend.square(y_true[-1,0:3]-y_pre[-1,0:3])
    loss_value_rotation = 1/5.7*K.backend.square(y_true[-1,3:6]-y_pre[-1,3:6])
    loss_value = K.backend.mean(loss_value_translation + loss_value_rotation)
    # loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
    # tf.print(y_pre)
    return loss_value

2.3、模型的训练

build()函数用来编译模型,建立回调函数;

train_fit() 函数是利用keras 的 fit函数来训练;

train_gradient() 函数是利用 apply_gradients函数训练,可以实时对loss和梯度进行监控,灵活性强;

save_model() 函数是保存模型的函数,可以有多种保存的方式,利用model.save() 可以保存为h5文件和tf格式的模型。

class Posenet:
    def __init__(self,dataset:datasets):
        self.dataset = dataset
        self.build()
    def build(self):
        self.poseModel = model(self.dataset.dim)
        self.poseModel.summary()
        self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
        self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
        decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
        self.callbacks = [decayCallbackScheduler]
            print("************************loading the model weights***********************************")
            self.poseModel.load_weights("model.h5")
        except:
    def train_fit(self):
        self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
        self.poseModel.fit(self.dataset.traindata,
                            validation_data=self.dataset.testdata,
                            epochs=self.dataset.epochs,
                            callbacks=[self.decayCallback],
                            verbose=1)
    def train_gradient(self):
        for step in range(self.dataset.epochs):
            loss = 0
            val_loss = 0
            lr = float(self.optm.lr)
            tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
            for (x1,x2),y in self.dataset.traindata:
                with tf.GradientTape() as tape:
                    prediction = self.poseModel([x1,x2])
                    # y = tf.cast(y,dtype=prediction.dtype)
                    loss = loss_fn(y,prediction)
                gradients = tape.gradient(loss,self.poseModel.trainable_variables)
                self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))
            for (x1,x2),y in self.dataset.testdata:
                prediction = self.poseModel([x1,x2])
                val_loss = loss_fn(y,prediction)
            tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss),lr,val_loss))
            K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))
    def save_model(self):
        利用 save 函数来保存,可以保存为h5文件,也可以保存为文件夹的形式,推荐保存第二种,再使用tf2onnx转onnx
        >>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
        self.poseModel.save("model.h5")
        self.poseModel.save(self.dataset.model_path)
        # self.poseModel.save_weights("model.h5") #只保存权重,没有保存结构
        # tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了
 
if __name__ == "__main__":
    dataset = datasets("images")
    posenet = Posenet(dataset)
    posenet.train_fit()
    # posenet.train_gradient() #利用 apply_gradient的方式训练
    posenet.save_model()

2.4、加载保存的模型

保存的模型文件夹为: kerasTempModel\

model = K.models.load_model(dataset.model_path,compile=False)

测试模型:

output = model([img_ref,img_cur])

2.5、完整代码

import argparse
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
import cv2 as cv
import os
import time
import random
from tensorflow.keras import optimizers
from tensorflow.keras import callbacks
class datasets:
    def __init__(self, datasetsPath:str):
        self.dataPath = datasetsPath
        self.dim = 512
        self.epochs = 40
        self.batch_size = 8
        self.train_percent = 0.92
        self.learning_rate = 2e-4
        self.model_path = 'kerasTempModel/'
        self.posetxt = os.path.join(self.dataPath,'pose.txt') 
        self.GetTheImagesAndPose()
        self.buildTrainData()
    def GetTheImagesAndPose(self):
        self.poselist = []
        with open(self.posetxt,'r') as f:
            for line in f.readlines():
                line = line.strip()
                line = line.split(' ')
                line.remove(line[0])
                self.poselist.append(line)
                # im1 im2 tx,ty,tz,roll,pitch,yaw
        #打乱数据集
        length = np.shape(self.poselist)[0]
        train_num =int(length * self.train_percent) 
        test_num = length - train_num
        randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
        self.train_pose_list = randomPoses[0:train_num,:]
        self.test_pose_list = randomPoses[train_num:length+1,:]
        print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
        print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")
    def load_image(self,index:tf.Tensor):
        img_ref= img_ref = tf.io.read_file(index[0])
        img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
        img_ref = tf.image.resize(img_ref,(self.dim,self.dim))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_ref = tf.cast(img_ref,tf.float32)
        img_cur = img_cur = tf.io.read_file(index[1])
        img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
        img_cur = tf.image.resize(img_cur,(self.dim,self.dim))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_cur = tf.cast(img_cur,tf.float32)
        pose = tf.strings.to_number(index[2:8],tf.float32)
        return (img_ref,img_cur),(pose)
    def buildTrainData(self):
        for example:\\
        >>> poses = dataset.y_train.take(20)\\
        >>> imgs = dataset.x1_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> imgs = dataset.x2_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
        self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)#.cache() 
        self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)
def model(dim):
    First = K.layers.Input(shape=(dim,dim,3),name="input1")
    Second = K.layers.Input(shape=(dim,dim,3),name="input2")
    x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
    x1 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x1)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.ReLU()(x1)
    x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.ReLU()(x1)
    x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x1)
    x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
    x2 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x2)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.ReLU()(x2)
    x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.ReLU()(x2)
    x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x2)
    x = K.layers.concatenate([x1,x2])
    x = K.layers.Conv2D(256,kernel_size=(3,3), strides=1,padding='same',
                        activation='relu')(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.ReLU()(x)
    x = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x)
    x = K.layers.Conv2D(128,kernel_size=(3,3), strides=1,padding='same',
                        activation='relu')(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.ReLU()(x)
    x = K.layers.Flatten()(x)
    x = K.layers.Dense(1024)(x)
    x = K.layers.Dense(6,name='Output')(x)
    poseModel = K.Model([First,Second],x)
    return poseModel
def loss_fn(y_true,y_pre):
    loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
    return loss_value
class learningDecay(K.callbacks.Callback):
    def __init__(self,schedule=None,alpha=1,verbose=0):
        super().__init__()
        self.schedule = schedule
        self.verbose = verbose
        self.alpha = alpha
    def on_epoch_begin(self, epoch, logs=None):
        lr = float(K.backend.get_value(self.model.optimizer.lr))
        if self.schedule != None:
            lr = self.schedule(epoch,lr)
        else:
            if epoch != 0:
                lr = lr*self.alpha
        K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr))
        if self.verbose > 0:
            print(f"Current learning rate is {lr}")
def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1) 
class Posenet:
    def __init__(self,dataset:datasets):
        self.dataset = dataset
        self.build()
    def build(self):
        self.poseModel = model(self.dataset.dim)
        self.poseModel.summary()
        self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
        self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
        decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
        self.callbacks = [decayCallbackScheduler]
            print("************************loading the model weights***********************************")
            self.poseModel.load_weights("model.h5")
        except:
    def train_fit(self):
        self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
        self.poseModel.fit(self.dataset.traindata,
                            validation_data=self.dataset.testdata,
                            epochs=self.dataset.epochs,
                            callbacks=[self.decayCallback],
                            verbose=1)
    def train_gradient(self):
        for step in range(self.dataset.epochs):
            loss = 0
            val_loss = 0
            lr = float(self.optm.lr)
            tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
            for (x1,x2),y in self.dataset.traindata:
                with tf.GradientTape() as tape:
                    prediction = self.poseModel([x1,x2])
                    # y = tf.cast(y,dtype=prediction.dtype)
                    loss = loss_fn(y,prediction)
                gradients = tape.gradient(loss,self.poseModel.trainable_variables)
                self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))
            for (x1,x2),y in self.dataset.testdata:
                prediction = self.poseModel([x1,x2])
                val_loss = loss_fn(y,prediction)
            tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss),lr,val_loss))
            K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))
    def save_model(self):
        利用 save 函数来保存,可以保存为h5文件,也可以保存为文件夹的形式,推荐保存第二种,再使用tf2onnx转onnx
        >>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
        self.poseModel.save("model.h5")
        self.poseModel.save(self.dataset.model_path)
        # self.poseModel.save_weights("model.h5") #只保存权重,没有保存结构
        # tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了
if __name__ == "__main__":
    dataset = datasets("images")
    posenet = Posenet(dataset)
    posenet.train_fit()
    # posenet.train_gradient() #利用 apply_gradient的方式训练
    posenet.save_model()
 
import argparse
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
import cv2 as cv
import os
import time
import sys
import random
from tensorflow.keras import optimizers
from tensorflow.keras import callbacks
from tensorflow.python.keras.saving.save import save_model
class datasets:
    def __init__(self, datasetsPath:str):
        self.dataPath = datasetsPath
        self.dim_w = 512
        self.dim_h = 512
        self.epochs = 200
        self.batch_size = 8
        self.train_percent = 0.92
        self.learning_rate = 2e-4
        self.model_path = 'kerasTempModel/'
        self.posetxt = os.path.join(self.dataPath,'pose.txt') 
        self.GetTheImagesAndPose()
        self.buildTrainData()
    def GetTheImagesAndPose(self):
        self.poselist = []
        with open(self.posetxt,'r') as f:
            for line in f.readlines():
                line = line.strip()
                line = line.split(' ')
                line.remove(line[0])
                self.poselist.append(line)
                # im1 im2 tx,ty,tz,roll,pitch,yaw
        #打乱数据集
        length = np.shape(self.poselist)[0]
        train_num =int(length * self.train_percent) 
        test_num = length - train_num
        randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
        self.train_pose_list = randomPoses[0:train_num,:]
        self.test_pose_list = randomPoses[train_num:length+1,:]
        print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
        print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")
    def load_image(self,index:tf.Tensor):
        img_ref= img_ref = tf.io.read_file(index[0])
        img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_ref = tf.image.resize(img_ref,(self.dim_w,self.dim_h))/255.0
        img_ref = tf.cast(img_ref,tf.float32)
        img_cur = img_cur = tf.io.read_file(index[1])
        img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
        img_cur = tf.image.resize(img_cur,(self.dim_w,self.dim_h))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_cur = tf.cast(img_cur,tf.float32)
        pose = tf.strings.to_number(index[2:8],tf.float32)
        return (img_ref,img_cur),(pose)
    def buildTrainData(self):
        for example:\\
        >>> poses = dataset.y_train.take(20)\\
        >>> imgs = dataset.x1_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> imgs = dataset.x2_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
        self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)#.cache() 
        self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)
#-------resnet 34-------------
def conv_block(inputs, 
        neuron_num, 
        kernel_size,  
        use_bias, 
        padding= 'same',
        strides= (1, 1),
        with_conv_short_cut = False):
    conv1 = K.layers.Conv2D(
        neuron_num,
        kernel_size = kernel_size,
        activation= 'relu',
        strides= strides,
        use_bias= use_bias,
        padding= padding
    )(inputs)
    conv1 = K.layers.BatchNormalization(axis = 1)(conv1)
    conv2 = K.layers.Conv2D(
        neuron_num,
        kernel_size= kernel_size,
        activation= 'relu',
        use_bias= use_bias,
        padding= padding)(conv1)
    conv2 = K.layers.BatchNormalization(axis = 1)(conv2)
    if with_conv_short_cut:
        inputs = K.layers.Conv2D(
            neuron_num, 
            kernel_size= kernel_size,
            strides= strides,
            use_bias= use_bias,
            padding= padding
            )(inputs)
        return K.layers.add([inputs, conv2])
    else:
        return K.layers.add([inputs, conv2])
def ResNet34(inputs,namescope = ""):
    x = K.layers.ZeroPadding2D((3, 3))(inputs)
    # Define the converlutional block 1
    x = K.layers.Conv2D(64, kernel_size= (7, 7), strides= (2, 2), padding= 'valid')(x)
    x = K.layers.BatchNormalization(axis= 1)(x)
    x = K.layers.MaxPooling2D(pool_size= (3, 3), strides= (2, 2), padding= 'same')(x)
    # Define the converlutional block 2
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    # Define the converlutional block 3
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
    # Define the converlutional block 4
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    # Define the converltional block 5
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
    x = K.layers.AveragePooling2D(pool_size=(7, 7))(x)
    return x
def model(dim_w,dim_h):
    First = K.layers.Input(shape=(dim_w,dim_h,3),name="input1")
    Second = K.layers.Input(shape=(dim_w,dim_h,3),name="input2")
    # x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
    x1 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(First)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.LeakyReLU()(x1)
    # x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
    # x1 = K.layers.BatchNormalization()(x1)
    # x1 = K.layers.ReLU()(x1)
    x1 = ResNet34(x1,"x1")
    # x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
    x2 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(Second)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.LeakyReLU()(x2)
    # x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
    # x2 = K.layers.BatchNormalization()(x2)
    # x2 = K.layers.ReLU()(x2)
    x2 = ResNet34(x2,"x2")
    x = K.layers.concatenate([x1,x2])
    x = K.layers.Flatten()(x)
    x = K.layers.Dense(6,name='Output')(x)
    poseModel = K.Model([First,Second],x)
    return poseModel
def loss_fn(y_true,y_pre):
    loss_value_translation = K.backend.square(y_true[-1,0:3]-y_pre[-1,0:3])
    loss_value_rotation = 1/5.7*K.backend.square(y_true[-1,3:6]-y_pre[-1,3:6])
    loss_value = K.backend.mean(loss_value_translation + loss_value_rotation)
    # loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
    # tf.print(y_pre)
    return loss_value
class learningDecay(K.callbacks.Callback):
    def __init__(self,schedule=None,alpha=1,verbose=0):
        super().__init__()
        self.schedule = schedule
        self.verbose = verbose
        self.alpha = alpha
    def on_epoch_begin(self, epoch, logs=None):
        lr = float(K.backend.get_value(self.model.optimizer.lr))
        if self.schedule != None:
            lr = self.schedule(epoch,lr)
        else:
            if epoch >= 30:
                lr = lr*self.alpha
        K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr))
        if self.verbose > 0:
            print(f"Current learning rate is {lr}")
        #save the model
        if epoch % 20 == 0 and epoch != 0:
            self.model.save("model.h5")
def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1) 
class Posenet:
    def __init__(self,dataset:datasets):
        self.dataset = dataset
        self.build()
    def build(self):
        self.poseModel = model(self.dataset.dim_w,self.dataset.dim_h)
        self.poseModel.summary()
        self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
        # self.optm = K.optimizers.Adam(1e-4)
        self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
        decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
        self.callbacks = [decayCallbackScheduler]
            print("************************loading the model weights***********************************")
            self.poseModel.load_weights("model.h5")
        except:
    def train_fit(self):
        self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
        self.poseModel.fit(self.dataset.traindata,
                            validation_data=self.dataset.testdata,
                            epochs=self.dataset.epochs,
                            callbacks=[self.decayCallback],
                            verbose=1)
    def train_gradient(self):
        for step in range(self.dataset.epochs):
            loss = 0
            val_loss = 0
            index = 0
            lr = float(self.optm.lr)
            tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
            for (x1,x2),y in self.dataset.traindata:
                with tf.GradientTape() as tape:
                    prediction = self.poseModel([x1,x2])
                    # y = tf.cast(y,dtype=prediction.dtype)
                    loss = loss + loss_fn(y,prediction)
                gradients = tape.gradient(loss,self.poseModel.trainable_variables)
                self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))
                index = index + 1
                sys.stdout.write('--------train loss is %.5f-----'%(loss/float(index)))
                sys.stdout.write('\r')
                sys.stdout.flush()
            index_val = 0
            for (x1,x2),y in self.dataset.testdata:
                prediction = self.poseModel([x1,x2])
                val_loss = val_loss + loss_fn(y,prediction)
                index_val = index_val + 1
            tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss/float(index)),lr,val_loss/float(index_val)))
            K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))
            if step%40==0:
                self.save_model()
    def save_model(self):
        利用 save 函数来保存,可以保存为h5文件,也可以保存为文件夹的形式,推荐保存第二种,再使用tf2onnx转onnx
        >>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
        self.poseModel.save("model.h5")
        self.poseModel.save(self.dataset.model_path)
        # self.poseModel.save_weights("model.h5") #只保存权重,没有保存结构
        # tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了
def test(dataset):
    im1 = cv.imread("imagesNDI/0.jpg")
    im1 = cv.resize(im1,(512,512))
    im1 = np.array(im1,np.float).reshape((1,512,512,3))/255.0
    im2 = cv.imread("imagesNDI/1.jpg")
    im2 = cv.resize(im2,(512,512))
    im2 = np.array(im2,np.float).reshape((1,512,512,3))/255.0
    posemodel = K.models.load_model(dataset.model_path,compile=False)
    pose = posemodel([im1,im2])
    print(np.array(pose))
if __name__ == "__main__":
    dataset = datasets("images")
    posenet = Posenet(dataset)
    posenet.train_fit()
    # posenet.train_gradient() #利用 apply_gradient的方式训练
    posenet.save_model()
    test(dataset)
 
#******onnxruntime*****
set(ONNXRUNTIME_ROOT_PATH /path to your onnxruntime-master)
set(ONNXRUNTIME_INCLUDE_DIRS ${ONNXRUNTIME_ROOT_PATH}/include/onnxruntime
                             ${ONNXRUNTIME_ROOT_PATH}/onnxruntime
                             ${ONNXRUNTIME_ROOT_PATH}/include/onnxruntime/core/session/)
set(ONNXRUNTIME_LIB ${ONNXRUNTIME_ROOT_PATH}/build/Linux/Release/libonnxruntime.so)

C++ main.cpp中
头文件:

#include <core/session/onnxruntime_cxx_api.h>
#include <core/providers/cuda/cuda_provider_factory.h>
#include <core/session/onnxruntime_c_api.h>
#include <core/providers/tensorrt/tensorrt_provider_factory.h>

 三、模型推理流程

总体来看,整个ONNXRuntime的运行可以分为三个阶段:

  • Session构造;
  • 模型加载与初始化;
  • 运行;

1、第1阶段:Session构造

构造阶段即创建一个InferenceSession对象。在python前端构建Session对象时,python端会通过http://onnxruntime_pybind_state.cc调用C++中的InferenceSession类构造函数,得到一个InferenceSession对象。

InferenceSession构造阶段会进行各个成员的初始化,成员包括负责OpKernel管理的KernelRegistryManager对象,持有Session配置信息的SessionOptions对象,负责图分割的GraphTransformerManager,负责log管理的LoggingManager等。当然,这个时候InferenceSession就是一个空壳子,只完成了对成员对象的初始构建。

2、第2阶段:模型加载与初始化

在完成InferenceSession对象的构造后,会将onnx模型加载到InferenceSession中并进行进一步的初始化。

2.1. 模型加载

模型加载时,会在C++后端会调用对应的Load()函数,InferenceSession一共提供了8种Load函数。包读从url,ModelProto,void* model data,model istream等读取ModelProto。InferenceSession会对ModelProto进行解析然后持有其对应的Model成员。

2.2. Providers注册

在Load函数结束后,InferenceSession会调用两个函数:RegisterExecutionProviders()和sess->Initialize();

RegisterExecutionProviders函数会完成ExecutionProvider的注册工作。这里解释一下ExecutionProvider,ONNXRuntime用Provider表示不同的运行设备比如CUDAProvider等。目前ONNXRuntimev1.0支持了包括CPU,CUDA,TensorRT,MKL等七种Providers。通过调用sess->RegisterExecutionProvider()函数,InferenceSession通过一个list持有当前运行环境中支持的ExecutionProviders。

2.3. InferenceSession初始化

即sess->Initialize(),这时InferenceSession会根据自身持有的model和execution providers进行进一步的初始化(在第一阶段Session构造时仅仅持有了空壳子成员变量)。该步骤是InferenceSession初始化的核心,一系列核心操作如内存分配,model partition,kernel注册等都会在这个阶段完成。

  1. 首先,session会根据level注册 graph optimization transformers,并通过GraphTransformerManager成员进行持有。
  2. 接下来session会进行OpKernel注册,OpKernel即定义的各个node对应在不同运行设备上的计算逻辑。这个过程会将持有的各个ExecutionProvider上定义的所有node对应的Kernel注册到session中,session通过KernelRegistryManager成员进行持有和管理。
  3. 然后session会对Graph进行图变换,包括插入copy节点,cast节点等。
  4. 接下来是model partition,也就是根运行设备对graph进行切分,决定每个node运行在哪个provider上。
  5. 最后,为每个node创建ExecutePlan,运行计划主要包含了各个op的执行顺序,内存申请管理,内存复用管理等操作。

3、第3阶段:模型运行

模型运行即InferenceSession每次读入一个batch的数据并进行计算得到模型的最终输出。然而其实绝大多数的工作早已经在InferenceSession初始化阶段完成。细看下源码就会发现run阶段主要是顺序调用各个node的对应OpKernel进行计算。

四、模型的部署

和其他所有主流框架相同,ONNXRuntime最常用的语言是python,而实际负责执行框架运行的则是C++。

下面就是C++通过onnxruntime对.onnx模型的使用,参考官方样例和常见问题写的模型多输入多输出的情况,部分参数可以参考样例或者查官方API文档。

1. 模型的初始化设置

	//模型位置
    string model_path = "../model.onnx";
	//初始化设置ONNXRUNTIME 的环境
    Ort::Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "PoseEstimate");
    Ort::SessionOptions session_options;
    //TensorRT加速开启,CUDA加速开启
    OrtSessionOptionsAppendExecutionProvider_Tensorrt(session_options, 0); //tensorRT
    OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
    session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
    Ort::AllocatorWithDefaultOptions allocator;
    //加载ONNX模型
    Ort::Session session(env, model_path.c_str(), session_options);
    Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);

打印模型信息:printModelInfo函数

void printModelInfo(Ort::Session &session, Ort::AllocatorWithDefaultOptions &allocator)
    //输出模型输入节点的数量
    size_t num_input_nodes = session.GetInputCount();
    size_t num_output_nodes = session.GetOutputCount();
    cout<<"Number of input node is:"<<num_input_nodes<<endl;
    cout<<"Number of output node is:"<<num_output_nodes<<endl;
    //获取输入输出维度
    for(auto i = 0; i<num_input_nodes;i++)
        std::vector<int64_t> input_dims = session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
        cout<<endl<<"input "<<i<<" dim is: ";
        for(auto j=0; j<input_dims.size();j++)
            cout<<input_dims[j]<<" ";
    for(auto i = 0; i<num_output_nodes;i++)
        std::vector<int64_t> output_dims = session.GetOutputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
        cout<<endl<<"output "<<i<<" dim is: ";
        for(auto j=0; j<output_dims.size();j++)
            cout<<output_dims[j]<<" ";
    //输入输出的节点名
    cout<<endl;//换行输出
    for(auto i = 0; i<num_input_nodes;i++)
        cout<<"The input op-name "<<i<<" is:"<<session.GetInputName(i, allocator)<<endl;
    for(auto i = 0; i<num_output_nodes;i++)
        cout<<"The output op-name "<<i<<" is:"<<session.GetOutputName(i, allocator)<<endl;
    //input_dims_2[0] = input_dims_1[0] = output_dims[0] = 1;//batch size = 1

函数应用:

//打印模型的信息
printModelInfo(session,allocator);

输出结果:

Number of input node is:2
Number of output node is:1
input 0 dim is: -1 512 512 3 
input 1 dim is: -1 512 512 3 
output 0 dim is: -1 6 
The input op-name 0 is:input1
The input op-name 1 is:input2
The output op-name 0 is:Output

如果事先不知道网络,通过打印信息这时候就可以定义全局变量:

//输入网络的维度
static constexpr const int width = 512;
static constexpr const int height = 512;
static constexpr const int channel = 3;
std::array<int64_t, 4> input_shape_{ 1,height, width,channel};

2、构建推理

2.1 构建推理函数computPoseDNN()步骤

  1. 对应用Opencv输入的Mat图像进行resize

        Mat Input_1,Input_2;
        resize(img_1,Input_1,Size(512,512));
        resize(img_2,Input_2,Size(512,512));
    
  2. 指定input和output的节点名,当然也可以定义在全局变量中,这里为了方便置入函数中

        std::vector<const char*> input_node_names = {"input1","input2"};
        std::vector<const char*> output_node_names = {"Output"};
    
  3. 分配image_ref和image_cur的内存,用指针数组存储,这里长度为 512 * 512 * 3,因为不能直接把Mat矩阵输入,所以需要数组来存储图像数据,然后再转ONNXRUNTIME专有的tensor类型即可:

        std::array<float, width * height *channel> input_image_1{};
        std::array<float, width * height *channel> input_image_2{};
        float* input_1 =  input_image_1.data();
        float* input_2 =  input_image_2.data();
    

    这里float类型根据自己网络需要来,也有可能是double, 可以利用下面的代码输出网络类型:

    cout<<session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetElementType();
    

    上面的c++代码会输出索引,对应下面的数据类型:

    typedef enum ONNXTensorElementDataType {
      ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED,
      ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,   // maps to c type float
      ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8,   // maps to c type uint8_t
      ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8,    // maps to c type int8_t
      ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16,  // maps to c type uint16_t
      ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16,   // maps to c type int16_t
      ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32,   // maps to c type int32_t
      ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64,   // maps to c type int64_t
      ONNX_TENSOR_ELEMENT_DATA_TYPE_STRING,  // maps to c++ type std::string
      ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL,
      ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16,
      ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE,      // maps to c type double
      ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32,      // maps to c type uint32_t
      ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64,      // maps to c type uint64_t
      ONNX_TENSOR_ELEMENT_DATA_TYPE_COMPLEX64,   // complex with float32 real and imaginary components
      ONNX_TENSOR_ELEMENT_DATA_TYPE_COMPLEX128,  // complex with float64 real and imaginary components
      ONNX_TENSOR_ELEMENT_DATA_TYPE_BFLOAT16     // Non-IEEE floating-point format based on IEEE754 single-precision
    } ONNXTensorElementDataType;
    

    例如,如果cout 输出 1,那么网络输出类型就是 float;

  4. 利用循环对float的数组进行赋值:这里可以是 CHW 或者 HWC 的格式:你在训练中很可能对数据进行了归一化处理,比如除以了255.0,这里数据还原就需要除以255.0

        for (int i = 0; i < Input_1.rows; i++) {
            for (int j = 0; j < Input_1.cols; j++) {
                for (int c = 0; c < 3; c++)
                    //NHWC 格式
                    if(c==0)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
                    if(c==1)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
                    if(c==2)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
                    //NCHW 格式
    //                if (c == 0)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
    //                if (c == 1)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
    //                if (c == 2)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;
        for (int i = 0; i < Input_2.rows; i++) {
            for (int j = 0; j < Input_2.cols; j++) {
                for (int c = 0; c < 3; c++)
                    //NHWC 格式
                    if(c==0)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
                    if(c==1)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
                    if(c==2)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
    
  5. 这里由于不同网络可能有多个输入节点和多个输出节点,这里需要用std::vector来定义Ort 的tensor;利用两个输入数据创建两个tensor:
        std::vector<Ort::Value> input_tensors;
        input_tensors.push_back(Ort::Value::CreateTensor<float>(
                memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
        input_tensors.push_back(Ort::Value::CreateTensor<float>(
                memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));
    
    其中 input_shape_就是输入的维度:
    std::array<int64_t, 4> input_shape_{ 1,512, 512,3};
    
  6. 前向推理:同样定义输出的tensor也为 vector,保证通用性

        std::vector<Ort::Value> output_tensors;
        output_tensors = session.Run(Ort::RunOptions { nullptr },
                                        input_node_names.data(), //输入节点名
                                        input_tensors.data(),     //input tensors
                                        input_tensors.size(),     //2
                                        output_node_names.data(), //输出节点名
                                        output_node_names.size()); //1
    
  7. 输出结果获取:由于本例输出只有一个维度,所以只需要 output_tensors[0]即可取出结果:
    float* output = output_tensors[0].GetTensorMutableData<float>();
    

    之后再进行位姿重构:

        Eigen::Vector3d t(output[0],output[1],output[2]);
        Eigen::Vector3d r(output[3],output[4],output[5]);
        // 初始化旋转向量
        Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
        Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
        Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
        // 转换为旋转矩阵,x y z的顺式
        Eigen::Matrix3d R_matrix_xyz  = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();
        return Sophus::SE3(R_matrix_xyz,t);
    

    2.2 函数具体代码

    Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session,Ort::MemoryInfo &memory_info)
        Mat Input_1,Input_2;
        resize(img_1,Input_1,Size(512,512));
        resize(img_2,Input_2,Size(512,512));
        std::vector<const char*> input_node_names = {"input1","input2"};
        std::vector<const char*> output_node_names = {"Output"};
        //将图像存储到数组中,BGR--->RGB
        std::array<float, width * height *channel> input_image_1{};
        std::array<float, width * height *channel> input_image_2{};
        float* input_1 =  input_image_1.data();
        float* input_2 =  input_image_2.data();
        for (int i = 0; i < Input_1.rows; i++) {
            for (int j = 0; j < Input_1.cols; j++) {
                for (int c = 0; c < 3; c++)
                    //NHWC 格式
                    if(c==0)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
                    if(c==1)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
                    if(c==2)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
                    //NCHW 格式
    //                if (c == 0)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
    //                if (c == 1)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
    //                if (c == 2)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;
        for (int i = 0; i < Input_2.rows; i++) {
            for (int j = 0; j < Input_2.cols; j++) {
                for (int c = 0; c < 3; c++)
                    //NHWC 格式
                    if(c==0)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
                    if(c==1)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
                    if(c==2)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
        std::vector<Ort::Value> input_tensors;
        input_tensors.push_back(Ort::Value::CreateTensor<float>(
                memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
        input_tensors.push_back(Ort::Value::CreateTensor<float>(
                memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));
        std::vector<Ort::Value> output_tensors;
        output_tensors = session.Run(Ort::RunOptions { nullptr },
                                        input_node_names.data(), //输入节点名
                                        input_tensors.data(),     //input tensors
                                        input_tensors.size(),     //2
                                        output_node_names.data(), //输出节点名
                                        output_node_names.size()); //1
    //    cout<<output_tensors.size()<<endl;//输出的维度
        float* output = output_tensors[0].GetTensorMutableData<float>();
        Eigen::Vector3d t(output[0],output[1],output[2]);
        Eigen::Vector3d r(output[3],output[4],output[5]);
        // 初始化旋转向量
        Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
        Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
        Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
        // 转换为旋转矩阵
        Eigen::Matrix3d R_matrix_xyz  = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();
        return Sophus::SE3(R_matrix_xyz,t);
    

    五、示例应用

    #include <core/session/onnxruntime_cxx_api.h>
    #include <core/providers/cuda/cuda_provider_factory.h>
    #include <core/session/onnxruntime_c_api.h>
    #include <core/providers/tensorrt/tensorrt_provider_factory.h>
    #include <opencv2/opencv.hpp>
    #include <sophus/se3.h>
    #include <iostream>
    Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session, Ort::MemoryInfo &memory_info);
    //输入网络的维度
    static constexpr const int width = 512;
    static constexpr const int height = 512;
    static constexpr const int channel = 3;
    std::array<int64_t, 4> input_shape_{ 1,height, width,channel};
    using namespace cv;
    using namespace std;
    int main()
    	//模型位置
        string model_path = "../model.onnx";
        Ort::Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "PoseEstimate");
        Ort::SessionOptions session_options;
        //CUDA加速开启
        OrtSessionOptionsAppendExecutionProvider_Tensorrt(session_options, 0); //tensorRT
        OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
        session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
        Ort::AllocatorWithDefaultOptions allocator;
        //加载ONNX模型
        Ort::Session session(env, model_path.c_str(), session_options);
        Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
        //打印模型的信息
        printModelInfo(session,allocator);
    	Mat img_1 = imread("/path_to_your_img1",IMREAD_COLOR);
        Mat img_2 = imread("/path_to_your_img2",IMREAD_COLOR);
        Sophus::SE3 pose = computePoseDNN(img_1,img_2,session,memory_info);
    Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session,Ort::MemoryInfo &memory_info)
        Mat Input_1,Input_2;
        resize(img_1,Input_1,Size(512,512));
        resize(img_2,Input_2,Size(512,512));
        std::vector<const char*> input_node_names = {"input1","input2"};
        std::vector<const char*> output_node_names = {"Output"};
        //将图像存储到uchar数组中,BGR--->RGB
        std::array<float, width * height *channel> input_image_1{};
        std::array<float, width * height *channel> input_image_2{};
        float* input_1 =  input_image_1.data();
        float* input_2 =  input_image_2.data();
        for (int i = 0; i < Input_1.rows; i++) {
            for (int j = 0; j < Input_1.cols; j++) {
                for (int c = 0; c < 3; c++)
                    //NHWC 格式
                    if(c==0)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
                    if(c==1)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
                    if(c==2)
                        input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
                    //NCHW 格式
    //                if (c == 0)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
    //                if (c == 1)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
    //                if (c == 2)
    //                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;
        for (int i = 0; i < Input_2.rows; i++) {
            for (int j = 0; j < Input_2.cols; j++) {
                for (int c = 0; c < 3; c++)
                    //NHWC 格式
                    if(c==0)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
                    if(c==1)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
                    if(c==2)
                        input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
        std::vector<Ort::Value> input_tensors;
        input_tensors.push_back(Ort::Value::CreateTensor<float>(
                memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
        input_tensors.push_back(Ort::Value::CreateTensor<float>(
                memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));
        std::vector<Ort::Value> output_tensors;
        output_tensors = session.Run(Ort::RunOptions { nullptr },
                                        input_node_names.data(), //输入节点名
                                        input_tensors.data(),     //input tensors
                                        input_tensors.size(),     //2
                                        output_node_names.data(), //输出节点名
                                        output_node_names.size()); //1
    //    cout<<output_tensors.size()<<endl;//输出的维度
        float* output = output_tensors[0].GetTensorMutableData<float>();
        Eigen::Vector3d t(output[0],output[1],output[2]);
        Eigen::Vector3d r(output[3],output[4],output[5]);
        // 初始化旋转向量,绕z轴旋转,y轴,x轴;
        Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
        Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
        Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
        // 转换为旋转矩阵
        Eigen::Matrix3d R_matrix_xyz  = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();
        return Sophus::SE3(R_matrix_xyz,t);
    void printModelInfo(Ort::Session &session, Ort::AllocatorWithDefaultOptions &allocator)
        //输出模型输入节点的数量
        size_t num_input_nodes = session.GetInputCount();
        size_t num_output_nodes = session.GetOutputCount();
        cout<<"Number of input node is:"<<num_input_nodes<<endl;
        cout<<"Number of output node is:"<<num_output_nodes<<endl;
        //获取输入输出维度
        for(auto i = 0; i<num_input_nodes;i++)
            std::vector<int64_t> input_dims = session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
            cout<<endl<<"input "<<i<<" dim is: ";
            for(auto j=0; j<input_dims.size();j++)
                cout<<input_dims[j]<<" ";
        for(auto i = 0; i<num_output_nodes;i++)
            std::vector<int64_t> output_dims = session.GetOutputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
            cout<<endl<<"output "<<i<<" dim is: ";
            for(auto j=0; j<output_dims.size();j++)
                cout<<output_dims[j]<<" ";
        //输入输出的节点名
        cout<<endl;//换行输出
        for(auto i = 0; i<num_input_nodes;i++)
            cout<<"The input op-name "<<i<<" is:"<<session.GetInputName(i, allocator)<<endl;
        for(auto i = 0; i<num_output_nodes;i++)
            cout<<"The output op-name "<<i<<" is:"<<session.GetOutputName(i, allocator)<<endl;
    

    C++ 上用 ONNXruntime 部署自己的模型_机器人学渣的博客-CSDN博客_c++ onnxruntime

    tensorflow keras 搭建相机位姿估计网络--例_机器人学渣的博客-CSDN博客_位姿估计网络

    C++ 上用 ONNXruntime 部署自己的模型_机器人学渣的博客-CSDN博客

    ONNXRuntime学习笔记(四) - Lee-zq - 博客园

    onnxruntime c++ 代码搜集_落花逐流水的博客-CSDN博客

    onnxruntime调用AI模型的python和C++编程_Arnold-FY-Chen的博客-CSDN博客

    微软联合Facebook等在2017年搞了个深度学习以及机器学习模型的格式标准–ONNX,旨在将所有模型格式统一为一致,更方便地实现模型部署。现在大多数的深度学习框架都支持ONNX模型转出并提供相应的导出接口。ONNXRuntime(Open Neural Network Exchange)是微软推出的一款针对ONNX模型格式的推理框架,用户可以非常便利的用其运行一个onnx模型。ONNXRuntime支持多种运行后端包括CPU,GPU,TensorRT,DML等。
    模型部署的时候需要将模型转换为onnx,转换好之后需要将图像传入验证对比原来的pth看输出参数是否一致。我的模型输出有三部分,直接贴出代码了,注释那部分可以实现指定输出部分。 import onnxruntime as ort ort_session = ort.InferenceSession('./RetinaFace.onnx') #加载模型并且初始化 input_name = ort_session.get_inputs()[0].name # outputs_1 = ort...
    本代码是根据yolov5导出onnx模型,通过c++部署; 本代码是可运行代码,但是项目目录下的包含目录需要根据自己的安装路径进行更改; 代码中的属性列表中包含opencv目录,需要根据自己下载的路径进行修改; 代码中的onnxruntime路径需要根据自己的路径配置,配置教程已经上传到博客中; 如果调试过程中有其他疑问,可以先看看我的这篇博客:pytorch导出yolov5 onnx模型用vs2019 c++推理保姆级教程
    ONNXRuntime是微软推出的一款推理框架,用户可以非常便利的用其运行一个onnx模型ONNXRuntime支持多种运行后端包括CPU,GPU,TensorRT,DML等。可以说ONNXRuntime是对ONNX模型最原生的支持。 虽然大家用ONNX时更多的是作为一个中间表示,从pytorch转到onnx后直接喂到TensorRT或MNN等各种后端框架,但这并不能否认ONNXRuntime是一款非常优秀的推理框架。而且由于其自身只包含推理功能(最新的ONNXRuntime甚至已经可以训练),通过阅读其
    onnxruntimec++使用 利用onnxonnxruntime实现pytorch深度框架使用C++推理进行服务器部署模型推理的性能是比python快很多的 python: pytorch == 1.6.0 onnx == 1.7.0 onnxruntime == 1.3.0 c++: onnxruntime-linux-x64-1.4.0 首先,利用pytorch自带的torch.onnx模块导出 .onnx模型文件,具体查看该部分pytorch官方文档,主要流程如下: def onnx2mnn(onnx_name,mnnname): cmd_dtr="python -m MNN.tools.mnnconvert -f ONNX --modelFile "+onnx_name+" --MNNModel "+mnnname+" --bizCode MNN" 将得到的模型转化为onnx模型,加载到c++中运行,来完成模型部署,下载并安装onnxruntime; CMakeLists.txt: cmake_minimum_required(VERSION 2.8) project(test) #使用clang++编译器 set(CMAKE_CXX_COMPILER clang++) set(CMAKE_BUILD_TYPE "Release") set(CMAKE_INCLUDE_CURRENT_DIR ON) #find the opencv and
    文章目录0. onnx模型准备以及测试图1. c++使用onnxruntime进行推理2. c++使用opencv进行推理 0. onnx模型准备以及测试图 参考:https://blog.csdn.net/qq_44747572/article/details/120820964?spm=1001.2014.3001.5501 1. c++使用onnxruntime进行推理 #include <opencv2/core.hpp> #include <opencv2/imgcodecs.h
    利用C++ ONNXruntime部署自己的模型,这里用Keras搭建好的一个网络模型来举例,转换为onnx的文件,在C++上进行部署,另外可以利用tensorRT加速。 目录一、模型的准备二、配置ONNXruntime三、模型部署1. 模型的初始化设置2. 构建推理构建推理函数computPoseDNN()步骤:函数具体代码:四、应用参考 一、模型的准备 搭建网络模型训练: tensorflow keras 搭建相机位姿估计网络–例 网络的输入输出为: 网络的输入: [image_ref , ima
    1. 将模型转换成 ONNX 格式 使用 PyTorch 软件包将 YOLOv5 Segmentation 模型训练并导出ONNX 格式。可以使用以下 Python 代码: import torch model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True) model.eval() # Export the model to ONNX format torch.onnx.export(model, # PyTorch model torch.rand(1, 3, 640, 640), # Input tensor shape "yolov5s.onnx", # Output ONNX model name export_params=True) # Export weights and biases 2. 使用 ONNX Runtime 部署模型 使用 ONNX Runtime C API,可以部署 ONNX 模型。可以使用以下 C++ 代码: #include <stdio.h> #include <assert.h> #include <fstream> #include <iostream> #include <vector> #include "onnxruntime_c_api.h" int main() OrtEnv* env; OrtCreateEnv(ORT_LOGGING_LEVEL_WARNING, "test", &env); OrtSession* session; OrtStatus* status; const char* model_path = "yolov5s.onnx"; OrtSessionOptions* session_options; OrtCreateSessionOptions(&session_options); status = OrtSessionOptionsAppendExecutionProvider_CPU(session_options, ORT_ENABLE_ALL); status = OrtCreateSession(env, model_path, session_options, &session); OrtMemoryInfo* memory_info; OrtCreateCpuMemoryInfo(OrtArenaAllocator, OrtMemTypeDefault, &memory_info); // Prepare input OrtValue* input_tensor = NULL; size_t input_size = 1 * 3 * 640 * 640; void* input_data = malloc(input_size); // TODO: Populate input_data with image data in BGR format status = OrtCreateTensorWithDataAsOrtValue(memory_info, input_data, input_size, {1, 3, 640, 640}, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, &input_tensor); // Prepare output OrtValue* output_tensor = NULL; const char* output_name = "output"; // TODO: Replace with the actual output name of the YOLOv5 Segmentation model status = OrtSessionGetOutputCount(session, &output_count); std::vector<const char*> output_names(output_count); std::vector<int64_t> output_shapes(output_count); for (int i = 0; i < output_count; i++) { char* output_name_temp; status = OrtSessionGetOutputName(session, i, memory_info, &output_name_temp); assert(status == NULL); output_names[i] = output_name_temp; OrtTensorTypeAndShapeInfo* output_info; OrtSessionGetOutputTypeInfo(session, i, &output_info); assert(status == NULL); size_t num_dims; OrtTensorTypeAndShapeInfoGetShape(output_info, &output_shapes[i], 1, &num_dims); assert(status == NULL); OrtReleaseTensorTypeAndShapeInfo(output_info); status = OrtSessionRun(session, NULL, &input_names[0], &input_tensors[0], 1, &output_names[0], 1, &output_tensor); assert(status == NULL); // TODO: Process output_tensor // Clean up OrtReleaseValue(input_tensor); OrtReleaseValue(output_tensor); OrtReleaseSession(session); OrtReleaseSessionOptions(session_options); OrtReleaseMemoryInfo(memory_info); OrtReleaseEnv(env); free(input_data); return 0; 3. 处理输出张量 YOLOv5 Segmentation 模型的输出张量是一个 4 维的张量,形状为 `[batch_size, num_classes, height, width]`,其中 `batch_size` 表示批大小,`num_classes` 表示类别数量,`height` 和 `width` 表示图像中每个像素的标签。可以使用以下 C++ 代码来解析输出张量: OrtStatus* status; float* output_data = OrtGetFloatPtr(output_tensor, &num_elements); status = OrtGetValueCount(output_tensor, &output_count); assert(status == NULL); const int num_classes = output_shapes[1]; const int height = output_shapes[2]; const int width = output_shapes[3]; std::vector<int> predictions(num_elements); for (int i = 0; i < num_elements; i++) { predictions[i] = (int) (output_data[i] * num_classes); // TODO: Process predictions 4. 可视化分割结果 可以使用 OpenCV C++ 库来可视化分割结果,代码如下: #include <opencv2/core.hpp> #include <opencv2/highgui.hpp> #include <opencv2/imgproc.hpp> // TODO: Replace with the actual image path const char* image_path = "test.jpg"; // TODO: Replace with the actual output post-processing code std::vector<int> predictions = postprocess_output(output_data, output_shapes); cv::Mat image = cv::imread(image_path); cv::Mat seg_image(height, width, CV_8UC3, cv::Scalar(0, 0, 0)); for (int y = 0; y < height; y++) { for (int x = 0; x < width; x++) { int prediction = predictions[y * width + x]; if (prediction == 0) { seg_image.at<cv::Vec3b>(y, x) = cv::Vec3b(255, 255, 255); // Background } else { seg_image.at<cv::Vec3b>(y, x) = cv::Vec3b(0, 0, 255); // Object cv::Mat result; cv::addWeighted(image, 0.5, seg_image, 0.5, 0, result); cv::imshow("Result", result); cv::waitKey(0); cv::destroyAllWindows(); Anaconda虚拟环境下更换python版本【不论升版本、降版本都使用conda install python命令】【注意:修改版本后原来使用pip安装的包会被删掉,无法使用】 21422 NLP-信息抽取:关系抽取【即:三元组抽取,主要用于抽取实体间的关系】【基于命名实体识别、分词、词性标注、依存句法分析、语义角色标注】【自定义模板/规则、监督学习(分类器)、半监督学习、无监督学习】 CSDN-Ada助手: 多亏了你这篇博客, 解决了问题: https://ask.csdn.net/questions/7979913, 请多输出高质量博客, 帮助更多的人 电气领域相关数据集(目标检测,分类图像数据及负荷预测),输电线路图像数据 今日上上签72: 兄弟,找到了没有表情包 PyTorch中的model.zero_grad() 与 optimizer.zero_grad() 雨下听风.: p.grad.detach_()是干什么的,和p.grad.zero_()连用目的是什么? 音频分类-数据集:AudioSet【Google发行的声音版ImageNet】 m0_59781092: 你好,这个GitHub网站我打开了里面没有音频数据集是为什么呀?