登录    注册    忘记密码

详细信息

基于PPO的Serverless平台自动伸缩策略研究    

Research on Auto-scaling Strategy for Serverless Platforms Based on PPO

文献类型:期刊文献

中文题名:基于PPO的Serverless平台自动伸缩策略研究

英文题名:Research on Auto-scaling Strategy for Serverless Platforms Based on PPO

作者:龙诺亚 李子鹏 张猛 郑元伟 张菡 童勇 王喜宾

第一作者:龙诺亚

机构:[1]贵州电网有限责任公司,贵州贵阳550002;[2]贵州大学计算机科学与技术学院,贵州贵阳550025;[3]联通(贵州)产业互联网有限公司,贵州贵阳550003;[4]贵州理工学院大数据学院,贵州贵阳550025

第一机构:贵州电网有限责任公司,贵州贵阳550002

年份:2026

卷号:44

期号:3

起止页码:102-110

中文期刊名:机械与电子

外文期刊名:Machinery & Electronics

语种:中文

中文关键词:Serverless;自动伸缩;近端策略优化;马尔科夫决策过程

外文关键词:Serverless;automatic scaling;proximal policy optimization(PPO);Markov decision process(MDP)

摘要:为提升Serverless平台自动伸缩的资源效率与服务质量稳定性,提出一种基于近端策略优化(PPO)的自动伸缩策略。首先,结合Knative弹性伸缩架构,将自动伸缩问题建模为马尔科夫决策过程,构建包含集群多维资源状态与负载特征的状态空间,设计融合吞吐量、响应时间及资源利用率阈值的复合奖励函数,并定义连续动作空间以适配Knative的参数配置特性。然后,基于Actor-Critic框架设计PPO算法,通过策略梯度优化与重要性采样机制实现稳定训练,解决传统强化学习方法在连续动作空间下的控制精度不足问题。最后,在Knative平台实现该策略,通过实时采集环境状态数据更新模型参数,动态调整资源分配与实例数量。实验结果表明,基于PPO的自动伸缩策略在平均吞吐量上相较基于Q-Learning的自动伸缩策略和平台默认策略KPA分别有19.3%和106.1%的提升,平均响应延迟相较其他2种对比策略分别减少12 ms和108 ms,P90响应延迟相较其他2种对比策略分别减少50 ms和223 ms,在并发场景下可以为Serverless云计算平台提供更好的服务质量水平。
To enhance the resource efficiency and service quality stability of automatic scaling in Serverless platform,an automatic scaling strategy based on Proximal Policy Optimization(PPO)is proposed.Firstly,by integrating with the Knative elastic scaling architecture,the automatic scaling problem is modeled as a Markov Decision Process(MDP).A state space incorporating multi-dimensional cluster resource states and load characteristics is constructed,and a compound reward function integrating throughput,response time,and resource utilization thresholds is designed.A continuous action space is defined to accommodate the Knative’s parameter configuration characteristics.Subsequently,a PPO algorithm is designed based on the Actor-Critic framework.Stable training is achieved through policy gradient optimization and importance sampling mechanisms,addressing the insufficient control precision of traditional reinforcement learning methods in continuous action spaces.Finally,the strategy is implemented on the Knative platform,where model parameters are updated by collecting real-time environmental state data to dynamically adjust resource allocation and instance counts.Experimental results demonstrate that compared to Q-Learning-based and the platform’s default KPA strategies,the PPO-based strategy achieves improvements in average throughput by 19.3%and 106.1%,respectively.The average response latency is reduced by 12 ms and 108 ms,and the P90 response latency is reduced by 50 ms and 223 ms,respectively.In concurrent scenarios,it provides superior service quality for Serverless cloud computing platforms.

参考文献:

正在载入数据...

版权所有©贵州理工学院 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心