Deep Learning in Time-Frequency Domain for Document Layout Analysis
Document layout analysis plays an important role in the area of Document Understanding. It is responsible for identifying and classifying the different components of digital documents. Currently, there is no universal algorithm that fits all types of digital documents. This work presents a novel app...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
IEEE
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/9d207e9589d64e068014822efdf7aca1 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:9d207e9589d64e068014822efdf7aca1 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:9d207e9589d64e068014822efdf7aca12021-11-17T00:01:12ZDeep Learning in Time-Frequency Domain for Document Layout Analysis2169-353610.1109/ACCESS.2021.3125913https://doaj.org/article/9d207e9589d64e068014822efdf7aca12021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9605682/https://doaj.org/toc/2169-3536Document layout analysis plays an important role in the area of Document Understanding. It is responsible for identifying and classifying the different components of digital documents. Currently, there is no universal algorithm that fits all types of digital documents. This work presents a novel approach for identifying tables, figures, isolated equations and text regions in scientific papers using deep learning and computer vision techniques. Our proposed approach is a three-stage system: (i) obtaining the spectrograms of the horizontal and vertical intensity histograms of segmented regions of interest; (ii) labeling segmented regions of interest into text, table, and figures using a deep convolutional neural network classifier; and (iii) identifying isolated equations in text regions using Bag of Visual Words (BOVW) with Zernike moments. We built a new dataset composed of 11007 papers to perform the experiments, using two common segmentation metrics to evaluate our model: (1) Adjusted Rand Index (ARI) and (2) Variation of Information (VI). The proposed document layout analysis system reached an overall accuracy of 96.2685%, outperforming prior art with a less computational cost.Felipe GrijalvaErick SantosByron AcunaJuan Carlos RodriguezJulio Cesar LarcoIEEEarticleComputer-visiondeep-learningdocument layout analysisfeature engineeringPDFElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 151254-151265 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Computer-vision deep-learning document layout analysis feature engineering Electrical engineering. Electronics. Nuclear engineering TK1-9971 |
spellingShingle |
Computer-vision deep-learning document layout analysis feature engineering Electrical engineering. Electronics. Nuclear engineering TK1-9971 Felipe Grijalva Erick Santos Byron Acuna Juan Carlos Rodriguez Julio Cesar Larco Deep Learning in Time-Frequency Domain for Document Layout Analysis |
description |
Document layout analysis plays an important role in the area of Document Understanding. It is responsible for identifying and classifying the different components of digital documents. Currently, there is no universal algorithm that fits all types of digital documents. This work presents a novel approach for identifying tables, figures, isolated equations and text regions in scientific papers using deep learning and computer vision techniques. Our proposed approach is a three-stage system: (i) obtaining the spectrograms of the horizontal and vertical intensity histograms of segmented regions of interest; (ii) labeling segmented regions of interest into text, table, and figures using a deep convolutional neural network classifier; and (iii) identifying isolated equations in text regions using Bag of Visual Words (BOVW) with Zernike moments. We built a new dataset composed of 11007 papers to perform the experiments, using two common segmentation metrics to evaluate our model: (1) Adjusted Rand Index (ARI) and (2) Variation of Information (VI). The proposed document layout analysis system reached an overall accuracy of 96.2685%, outperforming prior art with a less computational cost. |
format |
article |
author |
Felipe Grijalva Erick Santos Byron Acuna Juan Carlos Rodriguez Julio Cesar Larco |
author_facet |
Felipe Grijalva Erick Santos Byron Acuna Juan Carlos Rodriguez Julio Cesar Larco |
author_sort |
Felipe Grijalva |
title |
Deep Learning in Time-Frequency Domain for Document Layout Analysis |
title_short |
Deep Learning in Time-Frequency Domain for Document Layout Analysis |
title_full |
Deep Learning in Time-Frequency Domain for Document Layout Analysis |
title_fullStr |
Deep Learning in Time-Frequency Domain for Document Layout Analysis |
title_full_unstemmed |
Deep Learning in Time-Frequency Domain for Document Layout Analysis |
title_sort |
deep learning in time-frequency domain for document layout analysis |
publisher |
IEEE |
publishDate |
2021 |
url |
https://doaj.org/article/9d207e9589d64e068014822efdf7aca1 |
work_keys_str_mv |
AT felipegrijalva deeplearningintimefrequencydomainfordocumentlayoutanalysis AT ericksantos deeplearningintimefrequencydomainfordocumentlayoutanalysis AT byronacuna deeplearningintimefrequencydomainfordocumentlayoutanalysis AT juancarlosrodriguez deeplearningintimefrequencydomainfordocumentlayoutanalysis AT juliocesarlarco deeplearningintimefrequencydomainfordocumentlayoutanalysis |
_version_ |
1718426065736564736 |