首页 | 本学科首页   官方微博 | 高级检索  
     

高速互连网络中基于网卡的归约计算硬件卸载机制
引用本文:常俊胜,熊泽宇,徐金波. 高速互连网络中基于网卡的归约计算硬件卸载机制[J]. 国防科技大学学报, 2022, 44(5): 171-179
作者姓名:常俊胜  熊泽宇  徐金波
作者单位:国防科技大学 计算机学院, 湖南 长沙 410073
基金项目:国家自然科学基金资助项目(61201336,41301490)
摘    要:聚合通信广泛应用于高性能计算的研究和工程领域。在大规模的科学和工程计算中,聚合通信开销占据很大比例,有时甚至可达到全部消息传递开销的80%,是高性能计算系统的性能瓶颈。因此提出了一种基于网卡的归约计算硬件卸载机制,通过在网卡上嵌入归约操作逻辑部件,实现了数据在传输过程中的计算,减轻了CPU的负担,降低了通信延迟。通过FPGA平台实现了16节点的归约操作实验,并基于xNetSimPlus模拟器模拟了不同节点规模的归约操作。实验证明,卸载机制能有效减少聚合通信中归约操作的时间,所提支持归约计算的网卡卸载机制最高可以加速归约操作2.71倍。

关 键 词:聚合通信  归约操作  硬件卸载
收稿时间:2020-09-01

NIC-based offloading mechanism supporting reduction operation on high-speed interconnection system
CHANG Junsheng,XIONG Zeyu,XU Jinbo. NIC-based offloading mechanism supporting reduction operation on high-speed interconnection system[J]. Journal of National University of Defense Technology, 2022, 44(5): 171-179
Authors:CHANG Junsheng  XIONG Zeyu  XU Jinbo
Affiliation:College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Abstract:Collective communication is widely used in the field of high-performance computing research and engineering. In large-scale scientific and engineering computing, collective communication overhead accounts for a large proportion, sometimes even reaching 80% of the total messaging overhead. It is the performance bottleneck of the high-performance computing system. A NIC-based offloading mechanism supporting reduction operation was proposed. By embedding reduction operation logic components on NIC, the calculation of data during transmission was implemented, and the burden on the CPU and the communication delay were reduced. A 16-node protocol operation experiment was realized through the FPGA(field programmable gate array) platform, and the protocol operation in different node size was simulated based on the xNetSimPlus simulator. Experiments show that the method can effectively reduce the time of protocol operation in collective communication, and the proposed NIC offloading mechanism that supports reduction operation hardware offload can accelerate all-reduce operations by up to 2.71 times.
Keywords:collective communication   all-reduce operation   hardware offload
点击此处可从《国防科技大学学报》浏览原始摘要信息
点击此处可从《国防科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号