Deep Learning in Time-Frequency Domain for Document Layout Analysis

Document layout analysis plays an important role in the area of Document Understanding. It is responsible for identifying and classifying the different components of digital documents. Currently, there is no universal algorithm that fits all types of digital documents. This work presents a novel app...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Felipe Grijalva, Erick Santos, Byron Acuna, Juan Carlos Rodriguez, Julio Cesar Larco
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
PDF
Acceso en línea:https://doaj.org/article/9d207e9589d64e068014822efdf7aca1
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:9d207e9589d64e068014822efdf7aca1
record_format dspace
spelling oai:doaj.org-article:9d207e9589d64e068014822efdf7aca12021-11-17T00:01:12ZDeep Learning in Time-Frequency Domain for Document Layout Analysis2169-353610.1109/ACCESS.2021.3125913https://doaj.org/article/9d207e9589d64e068014822efdf7aca12021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9605682/https://doaj.org/toc/2169-3536Document layout analysis plays an important role in the area of Document Understanding. It is responsible for identifying and classifying the different components of digital documents. Currently, there is no universal algorithm that fits all types of digital documents. This work presents a novel approach for identifying tables, figures, isolated equations and text regions in scientific papers using deep learning and computer vision techniques. Our proposed approach is a three-stage system: (i) obtaining the spectrograms of the horizontal and vertical intensity histograms of segmented regions of interest; (ii) labeling segmented regions of interest into text, table, and figures using a deep convolutional neural network classifier; and (iii) identifying isolated equations in text regions using Bag of Visual Words (BOVW) with Zernike moments. We built a new dataset composed of 11007 papers to perform the experiments, using two common segmentation metrics to evaluate our model: (1) Adjusted Rand Index (ARI) and (2) Variation of Information (VI). The proposed document layout analysis system reached an overall accuracy of 96.2685%, outperforming prior art with a less computational cost.Felipe GrijalvaErick SantosByron AcunaJuan Carlos RodriguezJulio Cesar LarcoIEEEarticleComputer-visiondeep-learningdocument layout analysisfeature engineeringPDFElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 151254-151265 (2021)
institution DOAJ
collection DOAJ
language EN
topic Computer-vision
deep-learning
document layout analysis
feature engineering
PDF
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
spellingShingle Computer-vision
deep-learning
document layout analysis
feature engineering
PDF
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Felipe Grijalva
Erick Santos
Byron Acuna
Juan Carlos Rodriguez
Julio Cesar Larco
Deep Learning in Time-Frequency Domain for Document Layout Analysis
description Document layout analysis plays an important role in the area of Document Understanding. It is responsible for identifying and classifying the different components of digital documents. Currently, there is no universal algorithm that fits all types of digital documents. This work presents a novel approach for identifying tables, figures, isolated equations and text regions in scientific papers using deep learning and computer vision techniques. Our proposed approach is a three-stage system: (i) obtaining the spectrograms of the horizontal and vertical intensity histograms of segmented regions of interest; (ii) labeling segmented regions of interest into text, table, and figures using a deep convolutional neural network classifier; and (iii) identifying isolated equations in text regions using Bag of Visual Words (BOVW) with Zernike moments. We built a new dataset composed of 11007 papers to perform the experiments, using two common segmentation metrics to evaluate our model: (1) Adjusted Rand Index (ARI) and (2) Variation of Information (VI). The proposed document layout analysis system reached an overall accuracy of 96.2685%, outperforming prior art with a less computational cost.
format article
author Felipe Grijalva
Erick Santos
Byron Acuna
Juan Carlos Rodriguez
Julio Cesar Larco
author_facet Felipe Grijalva
Erick Santos
Byron Acuna
Juan Carlos Rodriguez
Julio Cesar Larco
author_sort Felipe Grijalva
title Deep Learning in Time-Frequency Domain for Document Layout Analysis
title_short Deep Learning in Time-Frequency Domain for Document Layout Analysis
title_full Deep Learning in Time-Frequency Domain for Document Layout Analysis
title_fullStr Deep Learning in Time-Frequency Domain for Document Layout Analysis
title_full_unstemmed Deep Learning in Time-Frequency Domain for Document Layout Analysis
title_sort deep learning in time-frequency domain for document layout analysis
publisher IEEE
publishDate 2021
url https://doaj.org/article/9d207e9589d64e068014822efdf7aca1
work_keys_str_mv AT felipegrijalva deeplearningintimefrequencydomainfordocumentlayoutanalysis
AT ericksantos deeplearningintimefrequencydomainfordocumentlayoutanalysis
AT byronacuna deeplearningintimefrequencydomainfordocumentlayoutanalysis
AT juancarlosrodriguez deeplearningintimefrequencydomainfordocumentlayoutanalysis
AT juliocesarlarco deeplearningintimefrequencydomainfordocumentlayoutanalysis
_version_ 1718426065736564736