FAN-MCCD: Fast and Accurate Network for Multi-Scale Chinese Character Detection
Inaccurate localization due to scale-variation during character detection causes a widespread issue overconfidence in results of the document analysis community, for the most part in historical and handwritten documents. In this work, we explored the performance of a state-of-the-art network with a...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/8676484a284b49e2abffb4c892ca66d6 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Sumario: | Inaccurate localization due to scale-variation during character detection causes a widespread issue overconfidence in results of the document analysis community, for the most part in historical and handwritten documents. In this work, we explored the performance of a state-of-the-art network with a simple pipeline that fast and accurately predicts handwritten Chinese characters in old documents. In order to adapt to locations of characters with multi-scale more precisely, excluding pre-processing and in-between steps, we utilized a network with multi-scale feature maps. Then, across each feature map, pre-selected boxes of unalike scales and aspect ratios were employed. The last step was to prune the bounding boxes, sending them to non-maximum suppression to yield the final results. Focusing on a well-designed neural network architecture and loss function that presents well-classified examples, we found our experiments on Caoshu, Character, and Src-images datasets demonstrated that detection performance was enhanced for the detection rate (DT), the false positive per character (FPPC), and the F-score in the order of 98.84%, 0.71, and 97.64%, respectively. In comparison with SSD (single-shot detector), the detection performance of a detection rate (DT), the false positive per character (FPPC), and the F-score were 61.12%, 6.12, and 60.33%, respectively. |
---|