首页 | 本学科首页   官方微博 | 高级检索  
   检索      

基于SIMD处理器的全定制多粒度矩阵寄存器文件
引用本文:张凯,陈书明,王耀华,陈海燕,李振涛.基于SIMD处理器的全定制多粒度矩阵寄存器文件[J].国防科技大学学报,2013,35(4):156-160.
作者姓名:张凯  陈书明  王耀华  陈海燕  李振涛
作者单位:国防科大计算机学院,国防科大计算机学院,国防科大计算机学院,国防科大计算机学院,国防科大计算机学院
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
摘    要:在SIMD处理器上映射矩阵运算时会带来大量的数据重排操作从而降低系统性能。本文提出定制化的多粒度矩阵寄存器文件(MMRF)以消除数据重排操作。MMRF支持多粒度的并行行访问和列访问,从而提升矩阵运算的性能。MMRF可以被动态配置为不同的并行访问模式,在不同模式下一个或多个子矩阵可以被并行处理。实验结果显示,同传统的向量寄存器文件(VRF)和矩阵寄存器文件(MRF)相比,MMRF可分别带来2.21倍和1.6倍的平均性能提升,面积分别增加14.3%和3.7%,功耗分别增加14.6%和2.2%。同TMS320C64x+处理器相比,基于SIMD技术的FT-Matrix处理器在引入MMRF后可以得到5.65倍到7.71倍的性能提升。通过层次化的全定制设计技术,MMRF的面积和关键路径分别减少17.9%和39.1%。

关 键 词:SIMD  矩阵运算  多粒度  矩阵寄存器文件
收稿时间:2012/12/1 0:00:00

A customized multi-grain matrix register file for SIMD processors
ZHANG Kai,CHEN Shuming,WANG Yaohu,CHEN Haiyan and LI Zhentao.A customized multi-grain matrix register file for SIMD processors[J].Journal of National University of Defense Technology,2013,35(4):156-160.
Authors:ZHANG Kai  CHEN Shuming  WANG Yaohu  CHEN Haiyan and LI Zhentao
Institution:College of Computer, National University of Defense Technology, Changsha 410073, China;College of Computer, National University of Defense Technology, Changsha 410073, China;College of Computer, National University of Defense Technology, Changsha 410073, China;College of Computer, National University of Defense Technology, Changsha 410073, China;College of Computer, National University of Defense Technology, Changsha 410073, China
Abstract:Mapping matrix operations on SIMD processors brings a large amount of data rearrangement that decreases the system performance. This paper proposes a customized Multi-Grain Matrix Register File (MMRF) that supports multi-grained parallel row-wise and column-wise access to eliminate these data rearrangement and increase the performance of matrix operations. The MMRF could be configured into different parallel access modes in which one or several sub-matrices can be accessed in parallel. Experimental results show that, compared with the traditional Vector Register File (VRF) and the MRF, the MMRF can respectively achieve about 2.21x and 1.6x average performance improvement, where the area of MMRF increases by 14.3% and 3.7% respectively, and the power of MMRF increases by 14.6% and 2.2% respectively. Compared with TMS320C64x , our SIMD processor of FT-Matrix can achieve about 5.65x to 7.71x performance improvement by employing the MMRF. By hierarchical customized design technology, we reduce the area and critical-path delay of MMRF by 17.9% and 39.1% respectively.
Keywords:SIMD  Matrix Operation  Multi-Grain  Matrix Register File  
本文献已被 CNKI 等数据库收录!
点击此处可从《国防科技大学学报》浏览原始摘要信息
点击此处可从《国防科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号