首页 | 本学科首页   官方微博 | 高级检索  
   检索      

面向混部云失败批处理作业的预测算法
引用本文:林伟伟,石方,李毓睿,刘发贵,刘捷,彭绍亮,王子骏.面向混部云失败批处理作业的预测算法[J].国防科技大学学报,2022,44(5):71-79.
作者姓名:林伟伟  石方  李毓睿  刘发贵  刘捷  彭绍亮  王子骏
作者单位:华南理工大学 计算机科学与工程学院, 广东 广州 510006;鹏程实验室, 广东 深圳 518066;湖南大学 信息科学与工程学院, 湖南 长沙 410082;克莱姆森大学 计算学院, 美国 克莱姆森 29634
基金项目:国家自然科学基金面上基金资助项目(62072187,61872084);广东省重点领域研发计划资助项目(2021B0101420002);鹏程实验室重大任务资助项目(PCL2021A09);广东省基础与应用基础研究基金资助项目(2019B030302002);广州市开发区国际合作资助项目(2020GH10,2020GH10)
摘    要:为了降低混部云失败批处理作业的风险,使用K-means聚类算法将批处理作业分为四类,在分类的基础上提出了二层嵌套分类模型(two-layer nested classification model, TLNM),实现了基于TLNM的预测算法。基于Ali Trace 2018数据集上的实验结果表明,该算法的接受者操作特性(receiver operating characteristic, ROC)曲线明显优于其他常用分类器,ROC曲线下面积(即AUC)可以达到0.978,表明该算法具有良好的分类性能。同时召回率可以达到0.951,通过混淆矩阵可以看出TLNM算法能够准确预测出执行失败的批处理作业。

关 键 词:云计算  混部技术  作业失败预测  资源利用率
收稿时间:2020/11/23 0:00:00

Prediction algorithm for failed batch jobs in co-located cloud
LIN Weiwei,SHI Fang,LI Yurui,LIU Fagui,LIU Jie,PENG Shaoliang,WANG James Z..Prediction algorithm for failed batch jobs in co-located cloud[J].Journal of National University of Defense Technology,2022,44(5):71-79.
Authors:LIN Weiwei  SHI Fang  LI Yurui  LIU Fagui  LIU Jie  PENG Shaoliang  WANG James Z
Institution:School of Computer Science & Engineering, South China University of Technology, Guangzhou 510006, China;Peng Cheng Laboratory, Shenzhen 518066, China;College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China; School of Computing, Clemson University, Clemson 29634, USA
Abstract:In order to reduce the risk of failed batch jobs in co-located cloud, the K-means algorithm was used to divide batch jobs into four categories.On the basis of classification, the TLNM (two-layer nested classification model) was proposed and the prediction algorithm based on TLNM was implemented. Experiment results based on Ali Trace 2018 data set show that the ROC(receiver operating characteristic) curve of this algorithm is significantly better than other commonly used classifiers, and the area under the ROC curve (i.e.AUC) can reach 0.978, indicating that this algorithm has good classification performance. At the same time, the recall rate can reach 0.951. Through the confusion matrix, it can be seen that the TLNM algorithm can accurately predict the failed batch jobs.
Keywords:cloud computing  co-location  failed job prediction  resource utilization
点击此处可从《国防科技大学学报》浏览原始摘要信息
点击此处可从《国防科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号