Front Neurorobot.
2023; 17: 1131392.
Automatic vessel plate number recognition for surface unmanned vehicles with marine applications
Renran Zhang
,
Lei Zhang
,
(1)
Where
ϕ
in
,
ϕ
out
represent linear convolution operation to perform channel mixing, and indicates a depth-wise convolution.
The formulation Equation 1 introduce the 1-order interaction among the features
and
through the element-wise multiplication once. Similarly, the n-order form is formulated as:
Where the output is scaled by 1/α, and
g
k
are used to match the dimension in different orders in Equation 4.
From the recurise formular Equation 3, we can see that recursive gated convolution block achieves n-order spatial interactions. And the channel dimension in each order is set as the Equation 5 to avoid computational overhead.
Where the
C
indicates the number of channels.
This block can perform high-order spatial interactions to improve the learning and capacity of the neural network without extra computation. The details of model are show in
.
The structure of the recursive gated convolution block.
2.2. Decoupled head
In original YOLOv5, the classification and regression are completed simultaneously in detect layer with the same input feature map. However, there is conflict caused by spatial misalignment between classification and boundary regression, which may harm the performance of detection model (Ge et al.,
2021
). To be specific, a detector can hardly get a perfect trade-off result if accomplishing classification and regression from a same spatial point/anchor. Motivated by (Revisiting the Sibling Head in Object Detector), the decoupled head method is introduced which decouples these two tasks from spatial dimension by two disentangled proposals.
According to the observation above, the decoupled head is utilized to predict the class and localization instead of original detect head in YOLOv5. Different from the method in (Rethinking Classification and Localization for Object Detection), we propose a lite decoupled head method without fully connected layers to meet the requirement of real-time detection for USVs. The decouple head splits the classification and bounding box regression into two convolution heads, which have the identical structure with independent parameters. The details of proposed structure are shown in
.
As show in the
, the num of channels is firstly adjusted to 256 by convolution layer with 1x1 kernel. Here the * indicates the width and height stay the same as the input. Then the intermediate result is fed into the predict part constructed with two parallel branches, whereas one branch for classification and the other for regression. This operation can resolve the coupling problem exists between two tasks, which effectively improve the performance of detection model.
2.3. Reconstruction of bounding box loss
In the training phrase, the parameters of the model are updated according the result of loss function. The loss of YOLOv5 is calculated based on objection score, class probability score, and bounding box regression score, whereas the Binary Cross the bounding box regression score is calculated by CIOU. The CIOU expression is as follows.
Where indicates the width and height of box respectively, ρ
2
(
b
,
b
gt
) represents the Euler distance square of the center of the prediction box and the truth box, and
c
2
represents the diagonal distance square of the maximum circumscribed matrix between the prediction box and the truth box. β is the aspect ratio influence factor, and
v
represents the penalty items of the prediction box and the truth box. It can be seen from the formula that CIOU takes into account the center distance, area overlap, and aspect ratio of the prediction box and the truth box. Compared with ordinary IOU, it can more effectively reflect the similarity of the target box. Therefore, the loss design method based on CIOU can make the model training converge faster.
However, CIOU only takes the width height ratio as the influence factor, and does not explicitly consider the width height value (Zheng et al.,
2020
). For this reason, EIOU takes the length width influence factor as the penalty item, rather than the length width ratio (Yang et al.,
2021
). The formula is as follows,
Where,
and
represent width influence factor and length influence factor respectively. Because EIOU directly uses the length and width of the target box as the penalty term, it will theoretically bring faster convergence speed to the model training.
2.4. Redesigning the sizes of anchor boxes
The YOLOv5 is a model based on anchor, so the prior design of anchor size is very important. The anchor size of YOLOv5 is set according to the COCO dataset to obtain different aspect ratios of large, medium and small targets. However, there is a significant difference in size between ships and vessel plate number. There should be a recalculation of anchor size. After analysis, we simply use the K-means algorithm to regress, and the results obtained are not necessarily optimal, because the random initial values of the K-means algorithm have a greater impact on the results, and the robustness of the algorithm is poor.
Based on the above considerations, we adopted the K-means++ algorithm, hoping to obtain a more reasonable anchor size prior. The K-means++ algorithm process is as follows:
-
a). Choose one center uniformly at random among the data points.
-
b). For each data point x not chosen yet, compute
D
(
x
), the distance between x and the nearest center that has already been chosen.
-
c). Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to.
-
d). Repeat Steps 2 and 3 until k centers have been chosen.
-
e). Now that the initial centers have been chosen, proceed using standard k-means clustering.
This method was tested on our own dataset and the results are shown in
. The default anchor size (blue) is (10, 16, 33, 30, 62, 59, 116, 156, 373) and the optimized results (red) is (10, 16, 33, 30, 62, 59, 116, 156, 373). It can be seen that the optimized anchor is more consistent with the real data distribution, which can improve the performance of the model.
The comparison of anchor size.
3. Experiment and result analysis
To evaluate the performance of the proposed method, we conducted detection experiments on the computer carried with the USVs. In particular, all experiments are conducted on a computer with Intel(R) Core(TM) i5-9600K@3.7GHz CPU and NVIDIA GeForce RTX2080Ti GPU. The code was written in Python using the Pytorch software library and executed under Ubuntu 20.04.
3.1. Dataset description
The deep neural network are trained and verified based on a dataset, however, there is no relevant public dataset. In this paper, we propose to establish a vessel plate number dataset in USVs perspective. All the images are obtained from the electro-optical sensor carried by USVs. To increase the diversity of scenes, the number and symbol are displayed on the LED board (1.5 m × 1.5 m) carried on the target boat (10 m) and the content in the LED changes periodically as shown in
.
The dataset contains 5011 images and covers 16 types of objects, i.e., ship, buoy, single number, symbols(star, rectangle, triangle). The dataset are divided into training set and verification set according to ratio of 8:2. To further improve ship detection results, we propose to exploit the data augmentation methods, e.g., horizontal flipping, random translation, and mosaic augmentation, etc., to enlarge the original training dataset, shown in
.
3.2. Evaluation criteria
To quantitatively evaluate the detection results, the P(Precision), R(Recall), mAP are utilized in this paper. In particular, the P is the ratio of the number of true positives to the total number of positive predictions. The R is the ratio of the number of true positives to the total number of actual (relevant) objects. The mAP computes the average precision value for recall value which indicates the detection robustness and accuracy. The method used to calculate the mAP is the following formula:
Where the average precision score AP
n
is calculated for N data folds.
In this paper, the mAP@0.5:0.95 is adopted as the mAP criteria which represent the average mAP on different IOU thresholds (from 0.5 to 0.95, in steps of 0.05).
3.3. Model training
In the experiments, the input image size is 600 × 600, the training epoch is 300, the batch size is 16, the optimizer is SGD, and the initial learning rate is 0.01. To ensure the stability of convergence, the cosine annealing strategy is used to dynamically adjust the learning rate during training. The results are shown in
,
.
The train loss convergence curves are shown in
, containing the bounding box loss, confidence loss and classification loss. The loss function tends to convergence within the first 100 epochs, which indicates that the proposed method is stable and fast in convergence. The
indicates the proposed model performance well in vessel plate number recognition task.
3.4. Ablation experiments
As discussed in Section 2, the vessel plate number recognition model is proposed by taking into consideration several modules, e.g., recursive gated convolutions (RGConv), decoupled head (DH), EIOU, adaptive anchor size (AAS). Therefore, ablation experiments will be performed to determine which one improves detection performance more effectively. The detailed description of the numerical experiments can be found in
.
Table 1
The comparisons result of ablation experiments.
Methods
RGConv
mAP (%)
|
Original
|
x
|
x
|
x
|
x
|
64.12
|
Proposed
|
✓
|
x
|
x
|
x
|
67.51
|
✓
|
✓
|
x
|
x
|
68.03
|
✓
|
✓
|
✓
|
x
|
70.38
|
✓
|
✓
|
✓
|
✓
|
70.35
|
It is shown that the accuracy is lowest for the original YOLOv5. The introduction of RGConv and DH have the potential to enhance the accuracy of detection. It seems that the recognition accuracy, brought by EIOU, is not obvious. However, it is found in the training phase that the method with EIOU converges more steadily that the original YOLOv5. When compared to the original YOLOv5, the proposed method improves the mAP by 6.23%. As a consequence, the introduction of RGConv, AAS, DH, EIOU can bring positive effects on improved vessel plate number recognition results.
3.5. The experiment on USV
The method proposed in this paper was tested in the South China Sea in order to verify its practicability. The “Tian Xing” USV platform in the experiment can be seen in the
. The object boat is equipped with LED board that display the hull number. The visualization results are shown in
,
. The
represents the software interface that displays environment perception information for the USV. It can be found that the hull number of the target boat can be correctly identified by the proposed method while guaranteeing real-time detection results.
The software interface for perception system in USV.
4. Conclusion
In the practical application tasks of USVs, it is necessary to identify a vessel through its plate number. In this work, we proposed a method based on object detection model for recognizing vessel plate number in complicated sea environments applied to USVs. The accuracy and stability of model have been promoted by recursive gated convolution structure, decoupled head, reconstructing loss function, and redesigning the sizes of anchor boxes. To facilitate this research, a vessel plate number dataset is established in this paper. Furthermore, we conducted a field experiment with the “Tian Xing” platform in the South China Sea. Compared with the original YOLOv5, the proposed method could real-timely recognize both the ship and its plate number with higher accuracy. In both the civilian and military sectors, this has a great deal of significance.
Although the proposed method has achieved good results in the recognition of vessel plate numbers, it still has room for improvement. In addition, this paper does not consider the impact of ocean climate on recognition accuracy. Changes in climate often result in the degradation of images which brings additional challenges for recognition. In the future, combining image enhancement algorithms to improve recognition accuracy would provide a promising research direction.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Author contributions
RZ, LZ, and YS contributed to the conception and design of the study. RZ and QY organized the database. RZ performed the statistical analysis and wrote the first draft of the manuscript. RZ and GB wrote sections of the manuscript. LZ revised the article. All authors contributed to manuscript revision, read, and approved the submitted version.
Funding Statement
This research was funded by Heilongjiang Provincial Excellent Youth Fund (grant number YQ2021E013) and Central University Fund (grant number 3072022YY0101).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
-
Bochkovskiy A., Wang C. Y., Liao H. Y. M. (2020).
Yolov4: optimal speed and accuracy of object detection
.
arXiv preprint arXiv
2004
, 10934. 10.48550/arXiv.2004.10934 [
CrossRef
]
[
Google Scholar
]
-
Dobref V., Popa I., Popov P., Scurtu I. C. (2018).
“Unmanned Surface Vessel for Marine Data Acquisition,”
in
IOP Conference Series: Earth Environmental Science
. (IOP Publishing; ) Vol. 172, p. 012034.
[
Google Scholar
]
-
García-Silveira D., Lopez-Ricaurte L., Hernández-Pliego J. (2022).
Long-range movements of common kestrels (Falco Tinnunculus) in Southwestern Spain revealed by GPS tracking
.
J. Raptor Res
.
3
, 136. 10.3356/JRR-21-136 [
CrossRef
]
[
Google Scholar
]
-
Ge Z, Liu S, Wang F, Li Z, Sun J. (2021).
Yolox: exceeding yolo series in 2021
.
arXiv preprint arXiv
2107
, 08430. 10.48550/arXiv.2107.08430 [
CrossRef
]
[
Google Scholar
]
-
Harati-Mokhtari A., Wall A., Brooks P. (2007).
Automatic identification system (AIS): data reliability and human error implications
.
J. Navigat
.
60
, 373–389. 10.1017/S0373463307004298 [
CrossRef
]
[
Google Scholar
]
-
He S, Dong C, Dai S. L, Zou T. (2022).
Cooperative deterministic learning and formation control for underactuated USVs with prescribed performance
.
Int. J. Robust Nonlin. Cont
.
32
, 2902–2924. 10.1002/rnc.5871 [
CrossRef
]
[
Google Scholar
]
-
He K., Gkioxari G., Dollár P. (2017).
“Mask r-cnn,”
in
Proceedings of the IEEE International Conference on Computer Vision
(pp. 2961-2969).
[
Google Scholar
]
-
Huang S., Xu H., Xia X., Zhang Y. (2018).
End-to-end vessel plate number detection and recognition using deep convolutional neural networks and LSTMs[C]//2018 11th international symposium on computational intelligence and design (ISCID)
.
IEEE
.
1
, 195–199. 10.1109/ISCID.2018.00051 [
CrossRef
]
[
Google Scholar
]
-
Pouyaei A., Choi Y., Jung J. (2022).
Investigating the long-range transport of particulate matter in East Asia: introducing a new Lagrangian diagnostic tool
.
Atmos. Environ
.
278
, 119096. 10.1016/j.atmosenv.2022.119096 [
CrossRef
]
[
Google Scholar
]
-
Rao Y., Zhao W., Tang Y., Zhou J., Lim S. N., Lu J. (2022).
Hornet: efficient high-order spatial interactions with recursive gated convolutions
.
arXiv preprint arXiv
2207
, 14284. 10.48550/arXiv.2207.14284 [
CrossRef
]
[
Google Scholar
]
-
Redmon J., Divvala S., Girshick R.
You only look once: unified, real-time object detection
.
Proceed. IEEE Conf. Comp. Vis. Patt. Recog
. (2016)
3
, 779–788. 10.1109/CVPR.2016.91. [
CrossRef
]
[
Google Scholar
]
-
Redmon J., Farhadi A. (2018).
Yolov3: an incremental improvement
.
arXiv preprint arXiv
1804
, 02767. 10.48550/arXiv.1804.02767 [
CrossRef
]
[
Google Scholar
]
-
Redmon J., Farhadi A.
YOLO9000: better, faster, stronger
.
Proceed. IEEE Conf. Comp. Vis. Patt. Recog
. (2017)
3
, 7263–7271. 10.1109/CVPR.2017.690 [
CrossRef
]
[
Google Scholar
]
-
Wawrzyniak N., Hyla T., Bodus-Olkowska I. (2022).
Vessel identification based on automatic hull inscriptions recognition
.
PLoS ONE
17
, e0270575. 10.1371/journal.pone.0270575
[
PMC free article
]
[
PubMed
] [
CrossRef
]
[
Google Scholar
]
-
Yang T. H., Hsiung S. H., Kuo C. H. (2018).
Development of unmanned surface vehicle for water quality monitoring and measurement[C]//2018 IEEE international conference on applied system invention (ICASI)
.
IEEE
5
, 566–569. 10.1109/ICASI.2018.8394316 [
CrossRef
]
[
Google Scholar
]
-
Yang Z., Wang X., Li J. (2021).
EIoU: an improved vehicle detection algorithm based on vehiclenet neural network
.
J. Conf. Series. IOP Publishing
1924
, 012001. 10.1088/1742-6596/1924/1/012001 [
CrossRef
]
[
Google Scholar
]
-
Zhang W., Sun H., Zhou J., Liu X., Zhang Z., Min G. (2018).
Fully convolutional network based ship plate recognition.
2018 IEEE Int. Conf. Sys. Man Cybern. (SMC)
.
IEEE
5
, 1803–1808. 10.1109/SMC.2018.00312 [
CrossRef
]
[
Google Scholar
]
-
Zheng Z., Wang P., Liu W. (2020).
Distance-Io U loss: faster and better learning for bounding box regression
.
Proceed. AAAI Conf. Artif. Intell.
34
, 12993–13000. 10.1609/aaai.v34i07.6999 [
CrossRef
]
[
Google Scholar
]
Articles from
Frontiers in Neurorobotics
are provided here courtesy of
Frontiers Media SA