HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution

Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems ha...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lanjun Wan, Weihua Zheng, Xinpan Yuan
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/f511f88b91524831aba56e62d3a74ea7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f511f88b91524831aba56e62d3a74ea7
record_format dspace
spelling oai:doaj.org-article:f511f88b91524831aba56e62d3a74ea72021-11-18T00:09:33ZHCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution2169-353610.1109/ACCESS.2021.3124856https://doaj.org/article/f511f88b91524831aba56e62d3a74ea72021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9598836/https://doaj.org/toc/2169-3536Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems have attracted increasing attention especially. To make full use of multiple CPUs and accelerators to execute data-parallel applications, programmers may need to manually map computation and data to all available compute devices, which is tedious, error-prone, and difficult. Especially for some data-parallel applications, the inter-device communication could easily become the performance bottleneck of multi-device co-execution. Therefore, firstly, a runtime system is designed for supporting heterogeneous cooperative execution (HCE) of data-parallel applications, which can help programmers to automatically and efficiently map computation and data to multiple compute devices. Secondly, an incremental data transfer method is designed to avoid redundant data transfers between devices, and a three-way overlapping communication optimization method based on software pipelining is designed to effectively hide the inter-device communication overhead. Based on our previously proposed feedback-based dynamic and elastic task scheduling (FDETS) scheme and asynchronous-based dynamic and elastic task scheduling (ADETS) scheme, the modified FDETS that supports incremental data transfer and the modified ADETS that supports three-way overlapping communication optimization are proposed, which not only can effectively partition and balance the workload among multiple compute devices but also can significantly reduce data transfer overhead between devices. Thirdly, a prototype of the proposed runtime system is implemented, which provides a set of runtime APIs for task scheduling, device management, memory management, and transfer optimization. Our experimental results show that the communication overhead between devices is greatly reduced using the proposed inter-device communication optimization methods and the multi-device co-execution significantly outperforms the best single-device execution.Lanjun WanWeihua ZhengXinpan YuanIEEEarticleCommunication optimizationcooperative executiondata-parallel applicationsdynamic schedulingheterogeneous systemsruntime systemElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 147264-147279 (2021)
institution DOAJ
collection DOAJ
language EN
topic Communication optimization
cooperative execution
data-parallel applications
dynamic scheduling
heterogeneous systems
runtime system
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
spellingShingle Communication optimization
cooperative execution
data-parallel applications
dynamic scheduling
heterogeneous systems
runtime system
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Lanjun Wan
Weihua Zheng
Xinpan Yuan
HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
description Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems have attracted increasing attention especially. To make full use of multiple CPUs and accelerators to execute data-parallel applications, programmers may need to manually map computation and data to all available compute devices, which is tedious, error-prone, and difficult. Especially for some data-parallel applications, the inter-device communication could easily become the performance bottleneck of multi-device co-execution. Therefore, firstly, a runtime system is designed for supporting heterogeneous cooperative execution (HCE) of data-parallel applications, which can help programmers to automatically and efficiently map computation and data to multiple compute devices. Secondly, an incremental data transfer method is designed to avoid redundant data transfers between devices, and a three-way overlapping communication optimization method based on software pipelining is designed to effectively hide the inter-device communication overhead. Based on our previously proposed feedback-based dynamic and elastic task scheduling (FDETS) scheme and asynchronous-based dynamic and elastic task scheduling (ADETS) scheme, the modified FDETS that supports incremental data transfer and the modified ADETS that supports three-way overlapping communication optimization are proposed, which not only can effectively partition and balance the workload among multiple compute devices but also can significantly reduce data transfer overhead between devices. Thirdly, a prototype of the proposed runtime system is implemented, which provides a set of runtime APIs for task scheduling, device management, memory management, and transfer optimization. Our experimental results show that the communication overhead between devices is greatly reduced using the proposed inter-device communication optimization methods and the multi-device co-execution significantly outperforms the best single-device execution.
format article
author Lanjun Wan
Weihua Zheng
Xinpan Yuan
author_facet Lanjun Wan
Weihua Zheng
Xinpan Yuan
author_sort Lanjun Wan
title HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_short HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_full HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_fullStr HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_full_unstemmed HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_sort hce: a runtime system for efficiently supporting heterogeneous cooperative execution
publisher IEEE
publishDate 2021
url https://doaj.org/article/f511f88b91524831aba56e62d3a74ea7
work_keys_str_mv AT lanjunwan hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution
AT weihuazheng hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution
AT xinpanyuan hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution
_version_ 1718425261760839680