link之家

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

Automatic vessel plate number recognition for surface unmanned vehicles with marine applications

Renran Zhang , Lei Zhang , ^{$\begin{array}{l} [{p_{0}}^{H W \times C}, {q_{0}}^{H W \times C}] = ϕ_{in} (x) \in ℝ^{H W \times 2 C}, \\ p_{1} = f (q_{0}) ⊙ p_{0} \in ℝ^{H W \times C}, y = ϕ_{out} (p_{1}) \in ℝ^{H W \times C} \end{array}$}

(1)

Where ϕ _in , ϕ _out represent linear convolution operation to perform channel mixing, and indicates a depth-wise convolution.

The formulation Equation 1 introduce the 1-order interaction among the features ${p_{0}}^{(i)}$ and ${q_{0}}^{(i)}$ through the element-wise multiplication once. Similarly, the n-order form is formulated as:

\begin{array}{l} [{p_{0}}^{H W \times C_{0}}, {q_{0}}^{H W \times C_{0}}, . . ., {q_{n - 1}}^{H W \times C_{n - 1}}] = \\ ϕ_{in} (x) \in ℝ^{H W \times (C_{0} + \sum_{0 \leq k \leq n - 1} C_{k})} \end{array}

\begin{array}{l} P_{k + 1} = f_{k} (q_{k}) ⊙ g_{k} (p_{k}) / α, k = 0, 1, . . ., n - 1 \end{array}

(3)

Where the output is scaled by 1/α, and g _k are used to match the dimension in different orders in Equation 4.

g_{k} = {\begin{matrix} I n d e n t i t y, & k = 0, \\ L i n e a r (C_{k - 1}, C_{k}), & 1 \leq k \leq n - 1 \end{matrix}

(4)

From the recurise formular Equation 3, we can see that recursive gated convolution block achieves n-order spatial interactions. And the channel dimension in each order is set as the Equation 5 to avoid computational overhead.

\begin{array}{l} C_{k} = \frac{C}{2^{n - k - 1}}, 0 \leq k \leq n - 1 \end{array}

(5)

Where the C indicates the number of channels.

This block can perform high-order spatial interactions to improve the learning and capacity of the neural network without extra computation. The details of model are show in Figure 2 .

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0002.jpg

Open in a separate window

Figure 2

The structure of the recursive gated convolution block.

2.2. Decoupled head

In original YOLOv5, the classification and regression are completed simultaneously in detect layer with the same input feature map. However, there is conflict caused by spatial misalignment between classification and boundary regression, which may harm the performance of detection model (Ge et al., 2021 ). To be specific, a detector can hardly get a perfect trade-off result if accomplishing classification and regression from a same spatial point/anchor. Motivated by (Revisiting the Sibling Head in Object Detector), the decoupled head method is introduced which decouples these two tasks from spatial dimension by two disentangled proposals.

According to the observation above, the decoupled head is utilized to predict the class and localization instead of original detect head in YOLOv5. Different from the method in (Rethinking Classification and Localization for Object Detection), we propose a lite decoupled head method without fully connected layers to meet the requirement of real-time detection for USVs. The decouple head splits the classification and bounding box regression into two convolution heads, which have the identical structure with independent parameters. The details of proposed structure are shown in Figure 3 .

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0003.jpg

Open in a separate window

Figure 3

The architecture of decoupled head.

As show in the Figure 3 , the num of channels is firstly adjusted to 256 by convolution layer with 1x1 kernel. Here the * indicates the width and height stay the same as the input. Then the intermediate result is fed into the predict part constructed with two parallel branches, whereas one branch for classification and the other for regression. This operation can resolve the coupling problem exists between two tasks, which effectively improve the performance of detection model.

2.3. Reconstruction of bounding box loss

In the training phrase, the parameters of the model are updated according the result of loss function. The loss of YOLOv5 is calculated based on objection score, class probability score, and bounding box regression score, whereas the Binary Cross the bounding box regression score is calculated by CIOU. The CIOU expression is as follows.

\begin{array}{l} \begin{matrix} C I O U = I O U - \frac{ρ^{2} (b, b^{g t})}{c^{2}} - β v, \\ v = \frac{4}{π^{2}} {(arctan \frac{w^{g t}}{h^{g t}} - arctan \frac{w}{h})}^{2} \end{matrix} \end{array}

(6)

Where indicates the width and height of box respectively, ρ ² ( b , b ^gt ) represents the Euler distance square of the center of the prediction box and the truth box, and c ² represents the diagonal distance square of the maximum circumscribed matrix between the prediction box and the truth box. β is the aspect ratio influence factor, and v represents the penalty items of the prediction box and the truth box. It can be seen from the formula that CIOU takes into account the center distance, area overlap, and aspect ratio of the prediction box and the truth box. Compared with ordinary IOU, it can more effectively reflect the similarity of the target box. Therefore, the loss design method based on CIOU can make the model training converge faster.

However, CIOU only takes the width height ratio as the influence factor, and does not explicitly consider the width height value (Zheng et al., 2020 ). For this reason, EIOU takes the length width influence factor as the penalty item, rather than the length width ratio (Yang et al., 2021 ). The formula is as follows,

\begin{array}{l} E I O U = I O U - \frac{ρ^{2} (b, b^{g t})}{c^{2}} - \frac{ρ^{2} (w, w^{g t})}{{c_{w}}^{2}} - \frac{ρ^{2} (h, h^{g t})}{{c_{h}}^{2}} \end{array}

(7)

Where, $\frac{ρ^{2} (w, w^{g t})}{{c_{w}}^{2}}$ and $\frac{ρ^{2} (h, h^{g t})}{{c_{h}}^{2}}$ represent width influence factor and length influence factor respectively. Because EIOU directly uses the length and width of the target box as the penalty term, it will theoretically bring faster convergence speed to the model training.

2.4. Redesigning the sizes of anchor boxes

The YOLOv5 is a model based on anchor, so the prior design of anchor size is very important. The anchor size of YOLOv5 is set according to the COCO dataset to obtain different aspect ratios of large, medium and small targets. However, there is a significant difference in size between ships and vessel plate number. There should be a recalculation of anchor size. After analysis, we simply use the K-means algorithm to regress, and the results obtained are not necessarily optimal, because the random initial values of the K-means algorithm have a greater impact on the results, and the robustness of the algorithm is poor.

Based on the above considerations, we adopted the K-means++ algorithm, hoping to obtain a more reasonable anchor size prior. The K-means++ algorithm process is as follows:

a). Choose one center uniformly at random among the data points.
b). For each data point x not chosen yet, compute D ( x ), the distance between x and the nearest center that has already been chosen.
c). Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to.
d). Repeat Steps 2 and 3 until k centers have been chosen.
e). Now that the initial centers have been chosen, proceed using standard k-means clustering.

This method was tested on our own dataset and the results are shown in Figure 4 . The default anchor size (blue) is (10, 16, 33, 30, 62, 59, 116, 156, 373) and the optimized results (red) is (10, 16, 33, 30, 62, 59, 116, 156, 373). It can be seen that the optimized anchor is more consistent with the real data distribution, which can improve the performance of the model.

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0004.jpg

Open in a separate window

Figure 4

The comparison of anchor size.

3. Experiment and result analysis

To evaluate the performance of the proposed method, we conducted detection experiments on the computer carried with the USVs. In particular, all experiments are conducted on a computer with Intel(R) Core(TM) i5-9600K@3.7GHz CPU and NVIDIA GeForce RTX2080Ti GPU. The code was written in Python using the Pytorch software library and executed under Ubuntu 20.04.

3.1. Dataset description

The deep neural network are trained and verified based on a dataset, however, there is no relevant public dataset. In this paper, we propose to establish a vessel plate number dataset in USVs perspective. All the images are obtained from the electro-optical sensor carried by USVs. To increase the diversity of scenes, the number and symbol are displayed on the LED board (1.5 m × 1.5 m) carried on the target boat (10 m) and the content in the LED changes periodically as shown in Figure 5 .

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0005.jpg

Open in a separate window

Figure 5

The target boat.

The dataset contains 5011 images and covers 16 types of objects, i.e., ship, buoy, single number, symbols(star, rectangle, triangle). The dataset are divided into training set and verification set according to ratio of 8:2. To further improve ship detection results, we propose to exploit the data augmentation methods, e.g., horizontal flipping, random translation, and mosaic augmentation, etc., to enlarge the original training dataset, shown in Figure 6 .

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0006.jpg

Open in a separate window

Figure 6

The samples of dataset.

3.2. Evaluation criteria

To quantitatively evaluate the detection results, the P(Precision), R(Recall), mAP are utilized in this paper. In particular, the P is the ratio of the number of true positives to the total number of positive predictions. The R is the ratio of the number of true positives to the total number of actual (relevant) objects. The mAP computes the average precision value for recall value which indicates the detection robustness and accuracy. The method used to calculate the mAP is the following formula:

\begin{array}{l} mAP= \frac{1}{N} \sum_{n=1}^{N} A P_{n} \end{array}

(8)

Where the average precision score AP _n is calculated for N data folds.

In this paper, the mAP@0.5:0.95 is adopted as the mAP criteria which represent the average mAP on different IOU thresholds (from 0.5 to 0.95, in steps of 0.05).

3.3. Model training

In the experiments, the input image size is 600 × 600, the training epoch is 300, the batch size is 16, the optimizer is SGD, and the initial learning rate is 0.01. To ensure the stability of convergence, the cosine annealing strategy is used to dynamically adjust the learning rate during training. The results are shown in Figures 7 , ,8 8 .

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0007.jpg

Open in a separate window

Figure 7

The training loss convergence curve. (A) The loss of class. (B) The loss of object. (C) The loss of box.

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0008.jpg

Open in a separate window

Figure 8

The metric convergence curve. (A) The curve of metric P. (B) The curve of metric R. (C) The curve of metric mAP.

The train loss convergence curves are shown in Figure 7 , containing the bounding box loss, confidence loss and classification loss. The loss function tends to convergence within the first 100 epochs, which indicates that the proposed method is stable and fast in convergence. The Figure 8 indicates the proposed model performance well in vessel plate number recognition task.

3.4. Ablation experiments

As discussed in Section 2, the vessel plate number recognition model is proposed by taking into consideration several modules, e.g., recursive gated convolutions (RGConv), decoupled head (DH), EIOU, adaptive anchor size (AAS). Therefore, ablation experiments will be performed to determine which one improves detection performance more effectively. The detailed description of the numerical experiments can be found in Table 1 .

Table 1

The comparisons result of ablation experiments.

Methods RGConv mAP (%)
Original	x	x	x	x	64.12
Proposed	✓	x	x	x	67.51
	✓	✓	x	x	68.03
	✓	✓	✓	x	70.38
	✓	✓	✓	✓	70.35

Open in a separate window

It is shown that the accuracy is lowest for the original YOLOv5. The introduction of RGConv and DH have the potential to enhance the accuracy of detection. It seems that the recognition accuracy, brought by EIOU, is not obvious. However, it is found in the training phase that the method with EIOU converges more steadily that the original YOLOv5. When compared to the original YOLOv5, the proposed method improves the mAP by 6.23%. As a consequence, the introduction of RGConv, AAS, DH, EIOU can bring positive effects on improved vessel plate number recognition results.

3.5. The experiment on USV

The method proposed in this paper was tested in the South China Sea in order to verify its practicability. The “Tian Xing” USV platform in the experiment can be seen in the Figure 9 . The object boat is equipped with LED board that display the hull number. The visualization results are shown in Figures 10 , ,11. 11 . The Figure 10 represents the software interface that displays environment perception information for the USV. It can be found that the hull number of the target boat can be correctly identified by the proposed method while guaranteeing real-time detection results.

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0009.jpg

Open in a separate window

Figure 9

“Tian Xing” USV.

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0010.jpg

Open in a separate window

Figure 10

The software interface for perception system in USV.

An external file that holds a picture, illustration, etc. Object name is fnbot-17-1131392-g0011.jpg

Open in a separate window

Figure 11

The samples of recognition results.

4. Conclusion

In the practical application tasks of USVs, it is necessary to identify a vessel through its plate number. In this work, we proposed a method based on object detection model for recognizing vessel plate number in complicated sea environments applied to USVs. The accuracy and stability of model have been promoted by recursive gated convolution structure, decoupled head, reconstructing loss function, and redesigning the sizes of anchor boxes. To facilitate this research, a vessel plate number dataset is established in this paper. Furthermore, we conducted a field experiment with the “Tian Xing” platform in the South China Sea. Compared with the original YOLOv5, the proposed method could real-timely recognize both the ship and its plate number with higher accuracy. In both the civilian and military sectors, this has a great deal of significance.

Although the proposed method has achieved good results in the recognition of vessel plate numbers, it still has room for improvement. In addition, this paper does not consider the impact of ocean climate on recognition accuracy. Changes in climate often result in the degradation of images which brings additional challenges for recognition. In the future, combining image enhancement algorithms to improve recognition accuracy would provide a promising research direction.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

RZ, LZ, and YS contributed to the conception and design of the study. RZ and QY organized the database. RZ performed the statistical analysis and wrote the first draft of the manuscript. RZ and GB wrote sections of the manuscript. LZ revised the article. All authors contributed to manuscript revision, read, and approved the submitted version.

Funding Statement

This research was funded by Heilongjiang Provincial Excellent Youth Fund (grant number YQ2021E013) and Central University Fund (grant number 3072022YY0101).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Bochkovskiy A., Wang C. Y., Liao H. Y. M. (2020). Yolov4: optimal speed and accuracy of object detection . arXiv preprint arXiv 2004 , 10934. 10.48550/arXiv.2004.10934 [ CrossRef ] [ Google Scholar ]
Dobref V., Popa I., Popov P., Scurtu I. C. (2018). “Unmanned Surface Vessel for Marine Data Acquisition,” in IOP Conference Series: Earth Environmental Science . (IOP Publishing; ) Vol. 172, p. 012034. [ Google Scholar ]
García-Silveira D., Lopez-Ricaurte L., Hernández-Pliego J. (2022). Long-range movements of common kestrels (Falco Tinnunculus) in Southwestern Spain revealed by GPS tracking . J. Raptor Res . 3 , 136. 10.3356/JRR-21-136 [ CrossRef ] [ Google Scholar ]
Ge Z, Liu S, Wang F, Li Z, Sun J. (2021). Yolox: exceeding yolo series in 2021 . arXiv preprint arXiv 2107 , 08430. 10.48550/arXiv.2107.08430 [ CrossRef ] [ Google Scholar ]
Harati-Mokhtari A., Wall A., Brooks P. (2007). Automatic identification system (AIS): data reliability and human error implications . J. Navigat . 60 , 373–389. 10.1017/S0373463307004298 [ CrossRef ] [ Google Scholar ]
He S, Dong C, Dai S. L, Zou T. (2022). Cooperative deterministic learning and formation control for underactuated USVs with prescribed performance . Int. J. Robust Nonlin. Cont . 32 , 2902–2924. 10.1002/rnc.5871 [ CrossRef ] [ Google Scholar ]
He K., Gkioxari G., Dollár P. (2017). “Mask r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision (pp. 2961-2969). [ Google Scholar ]
Huang S., Xu H., Xia X., Zhang Y. (2018). End-to-end vessel plate number detection and recognition using deep convolutional neural networks and LSTMs[C]//2018 11th international symposium on computational intelligence and design (ISCID) . IEEE . 1 , 195–199. 10.1109/ISCID.2018.00051 [ CrossRef ] [ Google Scholar ]
Pouyaei A., Choi Y., Jung J. (2022). Investigating the long-range transport of particulate matter in East Asia: introducing a new Lagrangian diagnostic tool . Atmos. Environ . 278 , 119096. 10.1016/j.atmosenv.2022.119096 [ CrossRef ] [ Google Scholar ]
Rao Y., Zhao W., Tang Y., Zhou J., Lim S. N., Lu J. (2022). Hornet: efficient high-order spatial interactions with recursive gated convolutions . arXiv preprint arXiv 2207 , 14284. 10.48550/arXiv.2207.14284 [ CrossRef ] [ Google Scholar ]
Redmon J., Divvala S., Girshick R. You only look once: unified, real-time object detection . Proceed. IEEE Conf. Comp. Vis. Patt. Recog . (2016) 3 , 779–788. 10.1109/CVPR.2016.91. [ CrossRef ] [ Google Scholar ]
Redmon J., Farhadi A. (2018). Yolov3: an incremental improvement . arXiv preprint arXiv 1804 , 02767. 10.48550/arXiv.1804.02767 [ CrossRef ] [ Google Scholar ]
Redmon J., Farhadi A. YOLO9000: better, faster, stronger . Proceed. IEEE Conf. Comp. Vis. Patt. Recog . (2017) 3 , 7263–7271. 10.1109/CVPR.2017.690 [ CrossRef ] [ Google Scholar ]
Wawrzyniak N., Hyla T., Bodus-Olkowska I. (2022). Vessel identification based on automatic hull inscriptions recognition . PLoS ONE 17 , e0270575. 10.1371/journal.pone.0270575 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Yang T. H., Hsiung S. H., Kuo C. H. (2018). Development of unmanned surface vehicle for water quality monitoring and measurement[C]//2018 IEEE international conference on applied system invention (ICASI) . IEEE 5 , 566–569. 10.1109/ICASI.2018.8394316 [ CrossRef ] [ Google Scholar ]
Yang Z., Wang X., Li J. (2021). EIoU: an improved vehicle detection algorithm based on vehiclenet neural network . J. Conf. Series. IOP Publishing 1924 , 012001. 10.1088/1742-6596/1924/1/012001 [ CrossRef ] [ Google Scholar ]
Zhang W., Sun H., Zhou J., Liu X., Zhang Z., Min G. (2018). Fully convolutional network based ship plate recognition. 2018 IEEE Int. Conf. Sys. Man Cybern. (SMC) . IEEE 5 , 1803–1808. 10.1109/SMC.2018.00312 [ CrossRef ] [ Google Scholar ]
Zheng Z., Wang P., Liu W. (2020). Distance-Io U loss: faster and better learning for bounding box regression . Proceed. AAAI Conf. Artif. Intell. 34 , 12993–13000. 10.1609/aaai.v34i07.6999 [ CrossRef ] [ Google Scholar ]

Articles from Frontiers in Neurorobotics are provided here courtesy of Frontiers Media SA