支持原位计算的高效三角矩阵乘法向量化方法 Efficient vectorization method of triangular matrix multiplication supporting in-place calculation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

支持原位计算的高效三角矩阵乘法向量化方法

引用本文：	刘仲,田希,陈磊.支持原位计算的高效三角矩阵乘法向量化方法[J].国防科技大学学报,2014,36(6):7-11,47.

作者姓名：	刘仲田希陈磊

作者单位：	国防科技大学计算机学院,湖南长沙,410073

基金项目：	国家自然科学基金资助项目(61133007)

摘要：	向量化算法映射是向量处理器的难点问题。提出一种高效的支持原位计算的三角矩阵乘法向量化方法:将L1D配置为SRAM模式,用双缓冲的乒乓方式平滑多级存储结构的数据传输,使得内核的计算与DMA数据搬移完全重叠,让内核始终以峰值速度运行,从而取得最佳的计算效率;将不规则的三角矩阵乘法计算均衡分布到各个向量处理单元,充分开发向量处理器的多级并行性;将结果矩阵保存在乘数矩阵中,实现原位计算,节省了存储空间。实验结果表明,提出的向量化方法使三角矩阵乘法性能达到1053.7GFLOPS,效率为91.47%。
关键词：	三角矩阵乘法原位计算向量化向量处理器
收稿时间：	2014/4/22 0:00:00
Efficient vectorization method of triangular matrix multiplication supporting in-place calculation

LIU Zhong,TIAN Xi and CHEN Lei.Efficient vectorization method of triangular matrix multiplication supporting in-place calculation[J].Journal of National University of Defense Technology,2014,36(6):7-11,47.

Authors:	LIU Zhong TIAN Xi and CHEN Lei

Institution:	College of Computer, National University of Defense Technology, Changsha 410073, China;College of Computer, National University of Defense Technology, Changsha 410073, China;College of Computer, National University of Defense Technology, Changsha 410073, China

Abstract:	The vectorization of algorithm mapping for vector processors is a critical issue. An efficient vectorization method of triangular matrix multiplication which supports the in-place calculation was presented. L1D was configured as SRAM and the ping pong pattern with double buffering was designed to smooth the data transfers of multilevel storage structure, which made the kernel computation overlap the DMA data transfer fully and run with peak speed throughout, so then the optimal computation efficiency was achieved. Irregular triangular matrix multiplication computation was evenly distributed to all vector processing elements to fully exploit multiple levels of parallelism for vector processor. Result matrix was stored in multiplier matrix, thus, the in-place calculation was achieved and the memory space was saved. Experimental results show that the performance of triangular matrix multiplication attained from the presented vectorization method achieves 1053.7 GFLOPS and the efficiency of that reaches to 91.47%.

Keywords:	triangular matrix multiplication in-place calculation vectorization vector processor
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《国防科技大学学报》浏览原始摘要信息
	点击此处可从《国防科技大学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏