ImageVerifierCode 换一换
格式:PDF , 页数:60 ,大小:9.50MB ,
资源ID:3217025      下载积分:3 金币
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.bdocx.com/down/3217025.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(马尔可夫决策过程(MDP).pdf)为本站会员(b****2)主动上传,冰豆网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知冰豆网(发送邮件至service@bdocx.com或直接QQ联系客服),我们立即给予删除!

马尔可夫决策过程(MDP).pdf

1、?Markov decision process(MDP)?/?Email:?Markov Process?Markov Reward Process?MRP?Markov Decision Process?MDP?MDP?MDP?MDP?Markov Process?Markov Reward Process?MRP?Markov Decision Process?MDP?MDP?MDP?MDP?P(Xt+1|Xt,Xt?1,Xt?2,)=P(Xt+1|Xt)XtXt+1Xt?1,Xt?2,(Xt,t 2 I)?-?5231045P(4|3)?Random walk?P(Xt+1|Xt,Xt

2、?1,Xt?2,)=P(Xt+1|Xt,Xt?1)St=(Xt,Xt?1)St2(s,s),(s,r),(r,s),(r,r)?/?random walk?Markov Process?Markov Reward Process?MRP?Markov Decision Process?MDP?MDP?MDP?MDP?Markov reward process(MRP)10Reward 20MDP=Markov process+reward/utility functions?+?/?231045Reward 5Reward 0RewardRewardReward u(S=3)u(S=4)0.1

3、0.90.20.81.01.0MRP?MRP?state transition prob.?reward function?/?discount factor?SPU?MRP-?Reward 20231045Reward 5Reward 0RewardRewardReward u(S=3)u(S=4)0.10.90.20.81.01.0MRP13Reward 20231045Reward 5Reward 0Reward 6Reward 2Reward 90.10.90.20.81.01.0?Reward?immediate?“?”?SH(S)start from hereMRP?-?Backw

4、ard induction14Reward 20231045Reward 5Reward 0Reward 6Reward 2Reward 90.10.90.20.81.01.0H(S=4)=u(S=4)=2H(S=5)=u(S=5)=9MRP?-?15Reward 20231045Reward 5Reward 0Reward 6Reward 2Reward 90.10.90.20.81.01.0?“?”?H(S=3)=u(S=3)+?0.2H(S=4)+0.8H(S=5)=6+?0.2 2+0.8 9?2 0,1)MRP?-?16Reward 20231045Reward 5Reward 0R

5、eward 6Reward 2Reward 90.10.90.20.81.01.0?H(S=2),H(S=1),H(S=3)=u(S=3)+?0.2H(S=4)+0.8H(S=5)=6+?0.2 2+0.8 9MRP17Reward 20231045Reward 5Reward 0Reward 6Reward 2Reward 90.10.90.20.81.01.0H(St)=E?u(St)+?H(St+1)H(S)=u(S)+?XS02SP(S,S0)H(S0)?(?)?MRP-?absorbing state?Reward 202310Reward 5Reward 0Reward 61.01

6、.01.01.0MRP-?Value iteration19Reward 20231045Reward 5Reward 0Reward 6Reward 2Reward 90.10.90.20.81.01.0H(S),0,8S 2 S?H(S)=u(S)+?XS02SP(S,S0)H(S0)?H(S)?Markov Process?Markov Reward Process?MRP?Markov Decision Process?MDP?MDP?MDP?MDP?Markov decision process(MDP)22123Action A1Reward 20Current statePoss

7、ible future statePossible future stateMDP=Markov process+actions+reward functions?+?+?123Action A2Reward 5Current statePossible future statePossible future state0.10.9?1?2?Markov decision process(MDP)23Action A1Reward 20MDP=Markov process+actions+reward functions?+?+?123Action A2Reward 5Current stat

8、ePossible future statePossible future state0.10.9?1?2?MDP?MDP?action?state transition prob.?reward function?/?discount factor?SAPU?CMDP?POMDP?MDP?MDP?MDP?“?”?“?”?action/decision?MDP?MDP?-?-?reward-?PSA?Markov Process?Markov Reward Process?MRP?Markov Decision Process?MDP?MDP?MDP?MDP?MDP?MDP?“?”?Polic

9、y?:S 7!ASA?:S 7!A?Bellman?Bellman equation?MRP?MDP?H(S)=u(S)+?XS02SP(S,S0)H(S0)H(S,A)=u(S,A)+?XS02SP(S,A,S0)U(S0)U(S)=maxA2AH(S,A)?(S)=argmaxH(A,S)Bellman equation?Bellman equation?MRP?backward induction?absorbing state?“?”?Bellman equation?(Value iteration algorithm)?0?Bellman eqn?123Action A2Rewar

10、d 5Action A1Reward 20Current statePossible future statePossible future state0.10.9U0(S),0,8S 2 SHn+1(S,A)=u(S,A)+?XS02SP(S,A,S0)Un(S0)Un+1(S)=maxA2AHn+1(S,A)U(S)Value iteration algorithmFor each state :SH0(S),0Repeat until converge:For each state :SFor each action :AHn+1(S,A)=u(S,A)+?XS02SP(S,A,S0)U

11、n(S0)ComputeCompute and store?n+1(S)=argmaxAHn+1(S,A)Compute and storeUn+1(S)=maxA2AHn+1(S,A)Return?(S),U(S),8S 2 S?Bellman equation?(Policy iteration algorithm)?Value iteration?Bellman eqn?0(S),8S 2 S?n+1(S):S 7!A,8S 2 S?Bellman equation?-The principle of optimality?O(|A|S|2)|A|S|f(x)=x?Markov Proc

12、ess?Markov Reward Process?MRP?Markov Decision Process?MDP?MDP?MDP?MDP?MDP?State 0State 1State 2State 3S=0,1,2,3A=Left,Right0123Action:LeftAction:RightReward:-1 for every step movedDiscount factor:0.5MDP?State 0State 1State 2State 30123Action:LeftAction:RightP(A=Left)=266410001000010000103775P(A=Righ

13、t)=266410000010000100013775?Value:H=0.00.00.00.0Action:/Value:H=0.0-1.0-1.0-1.0Action:/Value:H=0.0-1.0-1.5-1.5Action:?Period 1Period 2Period 3MDP?MDP?Markov Process?Markov Reward Process?MRP?Markov Decision Process?MDP?MDP?MDP?MDP?:?,?,?/?-?:?:?41Copyright:Forbes?-?:?RF energy Tx/Rx Friis formula Be

14、amforming?42Powercaster Tx and RxPCharging station?Electricity chargers?:At different fixed locations,e.g.,power outlets,base stations End users of energy?:Those who need energy,but are not covered by chargers Mobile energy gateway?:Moving and charge/transferring(wirelessly)43?Buy/Sell energyEnergy

15、gateway buys from chargers(Charging)Each charger asks a certain price when charging Energy gateway sells to end users(Transferring)More users,more payments Near user gets more energy,thus higher payments44“?”“?”?Mobile energy gateway?end user of energy?RF?-?45?MDP?:?;:?;:?:?,?,?46S=S=(L,E,N,P)LENdec

16、ides end user paymentA=A=0,1,2PMDP:?47f(n,l|N)=3RB(n+23,N?n+1)B(N?n+1,n)?l3R3;n+23,N?n+1R(n,ES)=ZR?0f(n,l|N)r(eDn)dl+ZRR?f(n,l|N)r(gESl2)dlnthsum up to get overall paymentMDP:?48MDP:?49P(A=1)=2664.0.30.7.3775P(A=0)=2664.1.00.0.3775?Bellman equation?value iteration algorithm?pymdptoolbox?MDP?mdptoolbox?Matlab?MDP?MDP?/?Greedy scheme GRDY?:maximizing immediate utility Random scheme RND?:randomly taking any action(i.e.,0,1,2)from the action set Location-aware scheme LOCA:charging at charger,transfe

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1