HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems ha...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
IEEE
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/f511f88b91524831aba56e62d3a74ea7 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:f511f88b91524831aba56e62d3a74ea7 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:f511f88b91524831aba56e62d3a74ea72021-11-18T00:09:33ZHCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution2169-353610.1109/ACCESS.2021.3124856https://doaj.org/article/f511f88b91524831aba56e62d3a74ea72021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9598836/https://doaj.org/toc/2169-3536Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems have attracted increasing attention especially. To make full use of multiple CPUs and accelerators to execute data-parallel applications, programmers may need to manually map computation and data to all available compute devices, which is tedious, error-prone, and difficult. Especially for some data-parallel applications, the inter-device communication could easily become the performance bottleneck of multi-device co-execution. Therefore, firstly, a runtime system is designed for supporting heterogeneous cooperative execution (HCE) of data-parallel applications, which can help programmers to automatically and efficiently map computation and data to multiple compute devices. Secondly, an incremental data transfer method is designed to avoid redundant data transfers between devices, and a three-way overlapping communication optimization method based on software pipelining is designed to effectively hide the inter-device communication overhead. Based on our previously proposed feedback-based dynamic and elastic task scheduling (FDETS) scheme and asynchronous-based dynamic and elastic task scheduling (ADETS) scheme, the modified FDETS that supports incremental data transfer and the modified ADETS that supports three-way overlapping communication optimization are proposed, which not only can effectively partition and balance the workload among multiple compute devices but also can significantly reduce data transfer overhead between devices. Thirdly, a prototype of the proposed runtime system is implemented, which provides a set of runtime APIs for task scheduling, device management, memory management, and transfer optimization. Our experimental results show that the communication overhead between devices is greatly reduced using the proposed inter-device communication optimization methods and the multi-device co-execution significantly outperforms the best single-device execution.Lanjun WanWeihua ZhengXinpan YuanIEEEarticleCommunication optimizationcooperative executiondata-parallel applicationsdynamic schedulingheterogeneous systemsruntime systemElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 147264-147279 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Communication optimization cooperative execution data-parallel applications dynamic scheduling heterogeneous systems runtime system Electrical engineering. Electronics. Nuclear engineering TK1-9971 |
spellingShingle |
Communication optimization cooperative execution data-parallel applications dynamic scheduling heterogeneous systems runtime system Electrical engineering. Electronics. Nuclear engineering TK1-9971 Lanjun Wan Weihua Zheng Xinpan Yuan HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
description |
Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems have attracted increasing attention especially. To make full use of multiple CPUs and accelerators to execute data-parallel applications, programmers may need to manually map computation and data to all available compute devices, which is tedious, error-prone, and difficult. Especially for some data-parallel applications, the inter-device communication could easily become the performance bottleneck of multi-device co-execution. Therefore, firstly, a runtime system is designed for supporting heterogeneous cooperative execution (HCE) of data-parallel applications, which can help programmers to automatically and efficiently map computation and data to multiple compute devices. Secondly, an incremental data transfer method is designed to avoid redundant data transfers between devices, and a three-way overlapping communication optimization method based on software pipelining is designed to effectively hide the inter-device communication overhead. Based on our previously proposed feedback-based dynamic and elastic task scheduling (FDETS) scheme and asynchronous-based dynamic and elastic task scheduling (ADETS) scheme, the modified FDETS that supports incremental data transfer and the modified ADETS that supports three-way overlapping communication optimization are proposed, which not only can effectively partition and balance the workload among multiple compute devices but also can significantly reduce data transfer overhead between devices. Thirdly, a prototype of the proposed runtime system is implemented, which provides a set of runtime APIs for task scheduling, device management, memory management, and transfer optimization. Our experimental results show that the communication overhead between devices is greatly reduced using the proposed inter-device communication optimization methods and the multi-device co-execution significantly outperforms the best single-device execution. |
format |
article |
author |
Lanjun Wan Weihua Zheng Xinpan Yuan |
author_facet |
Lanjun Wan Weihua Zheng Xinpan Yuan |
author_sort |
Lanjun Wan |
title |
HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_short |
HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_full |
HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_fullStr |
HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_full_unstemmed |
HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_sort |
hce: a runtime system for efficiently supporting heterogeneous cooperative execution |
publisher |
IEEE |
publishDate |
2021 |
url |
https://doaj.org/article/f511f88b91524831aba56e62d3a74ea7 |
work_keys_str_mv |
AT lanjunwan hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution AT weihuazheng hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution AT xinpanyuan hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution |
_version_ |
1718425261760839680 |