Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training

To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Hyeonseong Choi, Jaehwan Lee
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/86e2e525a3c74baf80b24bf608c75dbb
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:86e2e525a3c74baf80b24bf608c75dbb
record_format dspace
spelling oai:doaj.org-article:86e2e525a3c74baf80b24bf608c75dbb2021-11-11T15:23:52ZEfficient Use of GPU Memory for Large-Scale Deep Learning Model Training10.3390/app1121103772076-3417https://doaj.org/article/86e2e525a3c74baf80b24bf608c75dbb2021-11-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/10377https://doaj.org/toc/2076-3417To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6 to overcome the limitations of GPU memory by virtually combining GPU memory and CPU memory. In addition, in CUDA 8, memory advise options are introduced to efficiently utilize CUDA Unified Memory. In this work, we propose a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training. We apply CUDA Unified Memory technology to PyTorch to see the performance of large-scale learning models through the expanded GPU memory. We conduct comprehensive experiments on how to efficiently utilize Unified Memory by applying memory advises when performing deep learning. As a result, when the data used for deep learning are divided into three types and a memory advise is applied to the data according to the access pattern, the deep learning execution time is reduced by 9.4% compared to the default Unified Memory.Hyeonseong ChoiJaehwan LeeMDPI AGarticledeep learninglarge-scale modelCUDA Unified MemoryPyTorchTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10377, p 10377 (2021)
institution DOAJ
collection DOAJ
language EN
topic deep learning
large-scale model
CUDA Unified Memory
PyTorch
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle deep learning
large-scale model
CUDA Unified Memory
PyTorch
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Hyeonseong Choi
Jaehwan Lee
Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
description To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6 to overcome the limitations of GPU memory by virtually combining GPU memory and CPU memory. In addition, in CUDA 8, memory advise options are introduced to efficiently utilize CUDA Unified Memory. In this work, we propose a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training. We apply CUDA Unified Memory technology to PyTorch to see the performance of large-scale learning models through the expanded GPU memory. We conduct comprehensive experiments on how to efficiently utilize Unified Memory by applying memory advises when performing deep learning. As a result, when the data used for deep learning are divided into three types and a memory advise is applied to the data according to the access pattern, the deep learning execution time is reduced by 9.4% compared to the default Unified Memory.
format article
author Hyeonseong Choi
Jaehwan Lee
author_facet Hyeonseong Choi
Jaehwan Lee
author_sort Hyeonseong Choi
title Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_short Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_full Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_fullStr Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_full_unstemmed Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
title_sort efficient use of gpu memory for large-scale deep learning model training
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/86e2e525a3c74baf80b24bf608c75dbb
work_keys_str_mv AT hyeonseongchoi efficientuseofgpumemoryforlargescaledeeplearningmodeltraining
AT jaehwanlee efficientuseofgpumemoryforlargescaledeeplearningmodeltraining
_version_ 1718435376915283968