首页 | 本学科首页   官方微博 | 高级检索  
     

面向图计算应用的处理器访存通路优化设计与实现
引用本文:张旭,常轶松,张科,陈明宇. 面向图计算应用的处理器访存通路优化设计与实现[J]. 国防科技大学学报, 2020, 42(2): 13-22
作者姓名:张旭  常轶松  张科  陈明宇
作者单位:中国科学院计算技术研究所先进计算机系统研究中心,中国科学院计算技术研究所先进计算机系统研究中心,中国科学院计算技术研究所先进计算机系统研究中心,中国科学院计算技术研究所先进计算机系统研究中心
基金项目:国家重点研发计划项目(No. 2017YFB1001602),中国科学院青年创新促进会(No. 2017143)
摘    要:针对图计算应用的访存特点,提出并实现一种支持高并发、乱序和异步访存的高并发访存模块(High Concurrency and high Performance Fetcher, HCPF)。通过软-硬件协同的设计方法,HCPF可同时处理192条共8种类型的内存访问请求,且访存粒度可由用户定义,满足图计算应用对海量低延迟细粒度数据访问的需求。同时,HCPF扩展了基于内存语义的跨计算节点定制互连技术,支持远程内存的细粒度直接访问,为后续实现分布式图计算框架提供技术基础。结合上述两个核心研究内容,基于流水线RISC-V处理器核,设计并实现了可支持HCPF的RISC-V片上系统(System-on-Chip,SoC)架构,搭建基于FPGA的原型验证平台,并使用自研测试程序对HCPF进行初步性能评测。实验结果表明,HCPF相比原有访存通路,最高可将基于数组和随机地址的两种随机内存访问性能分别提升至3.5倍和2.7倍。远程内存直接访问4 Byte数据的延时仅为1.63μs。

关 键 词:内存级并行  访存通路  图计算应用
收稿时间:2019-09-19
修稿时间:2019-11-26

Design and implementation of a novel off-chip memory access path for graph computing
ZHANG Xu,CHANG Yisong,ZHANG Ke,CHEN Mingyu. Design and implementation of a novel off-chip memory access path for graph computing[J]. Journal of National University of Defense Technology, 2020, 42(2): 13-22
Authors:ZHANG Xu  CHANG Yisong  ZHANG Ke  CHEN Mingyu
Affiliation:Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;Peng Cheng Laboratory, Shenzhen 518000, China
Abstract:Graph computing plays a significant role in the big data and artificial intelligence era. In graph computing, massive off-chip memory access requests that exhibit high concurrency, poor locality, and fine granularity are generated. Due to the increasing performance gap between memory and general-purpose processor, long memory access latency in graph computing becomes an essential performance bottleneck. Exploiting memory-level parallelism to hide memory access latency is one of the effective ways to speedup graph computing applications. We focus on how to redesign the off-chip memory access path in this dissertation for memory-level parallelism exploitation. We propose a novel asynchronous memory access path that supports highly concurrent and out-of-order off-chip memory requests. In order to satisfy requirements of graph applications, we implement a software-defined interface in our proposed memory access path to handle hundreds of kinds of off-chip memory requests with arbitrary granularity via hardware-software co-design methodology. We also design a custom memory semantic interconnect for fine-grained remote memory access among various computing nodes leveraged in future distributed graph processing scenarios. Last but not least, we integrate our proposed novel memory access path into a RISC-V instruction set architecture-based System-on-Chip (SoC) architecture and implement an FPGA prototype. Based on our custom random access microbenchmarks, preliminary evaluation results show that performance of array-based and random address-based off-chip memory access are improved by 3.5x and 2.7x respectively using our proposed asynchronous memory access path, and accessing 4 bytes data from remote memory only takes 1.63 microseconds.
Keywords:Memory-level Parallelism   Memory Access Path   Graph Computing
本文献已被 CNKI 等数据库收录!
点击此处可从《国防科技大学学报》浏览原始摘要信息
点击此处可从《国防科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号