近似动态规划相关的外文文献及翻译.docx
《近似动态规划相关的外文文献及翻译.docx》由会员分享,可在线阅读,更多相关《近似动态规划相关的外文文献及翻译.docx(15页珍藏版)》请在冰豆网上搜索。
近似动态规划相关的外文文献及翻译
外文文献:
AdaptiveDynamicProgramming:
AnIntroduction
Abstract:
Inthisarticle,weintroducesomerecentresearchtrendswithinthefieldofadaptive/approximatedynamicprogramming(ADP),includingthevariationsonthestructureofADPschemes,thedevelopmentofADPalgorithmsandapplicationsofADPschemes.ForADPalgorithms,thepointoffocusisthatiterativealgorithmsofADPcanbesortedintotwoclasses:
oneclassistheiterativealgorithmwithinitialstablepolicy;theotheristheonewithouttherequirementofinitialstablepolicy.Itisgenerallybelievedthatthelatteronehaslesscomputationatthecostofmissingtheguaranteeofsystemstabilityduringiterationprocess.Inaddition,manyrecentpapershaveprovidedconvergenceanalysisassociatedwiththealgorithmsdeveloped.Furthermore,wepointoutsometopicsforfuturestudies.
Introduction
Asiswellknown,therearemanymethodsfordesigningstablecontrolfornonlinearsystems.However,stabilityisonlyabareminimumrequirementinasystemdesign.Ensuringoptimalityguaranteesthestabilityofthenonlinearsystem.Dynamicprogrammingisaveryusefultoolinsolvingoptimizationandoptimalcontrolproblemsbyemployingtheprincipleofoptimality.In[16],theprincipleofoptimalityisexpressedas:
“Anoptimalpolicyhasthepropertythatwhatevertheinitialstateandinitialdecisionare,theremainingdecisionsmustconstituteanoptimalpolicywithregardtothestateresultingfromthefirstdecision.”Thereareseveralspectrumsaboutthedynamicprogramming.Onecanconsiderdiscrete-timesystemsorcontinuous-timesystems,linearsystemsornonlinearsystems,time-invariantsystemsortime-varyingsystems,deterministicsystemsorstochasticsystems,etc.
Wefirsttakealookatnonlineardiscrete-time(timevarying)dynamical(deterministic)systems.Time-varyingnonlinearsystemscovermostoftheapplicationareasanddiscrete-timeisthebasicconsiderationfordigitalcomputation.Supposethatoneisgivenadiscrete-timenonlinear(timevarying)dynamicalsystem
where
representsthestatevectorofthesystemand
denotesthecontrolactionandFisthesystemfunction.Supposethatoneassociateswiththissystemtheperformanceindex(orcost)
whereUiscalledtheutilityfunctionandgisthediscountfactorwith0,g#1.NotethatthefunctionJisdependentontheinitialtimeiandtheinitialstatex(i),anditisreferredtoasthecost-to-goofstatex(i).Theobjectiveofdynamicprogrammingproblemistochooseacontrolsequenceu(k),k5i,i11,c,sothatthefunctionJ(i.e.,thecost)in
(2)isminimized.AccordingtoBellman,theoptimalcostfromtimekisequalto
Theoptimalcontrolu*1k2attimekistheu1k2whichachievesthisminimum,i.e.,
Equation(3)istheprincipleofoptimalityfordiscrete-timesystems.Itsimportanceliesinthefactthatitallowsonetooptimizeoveronlyonecontrolvectoratatimebyworkingbackwardintime.
Innonlinearcontinuous-timecase,thesystemcanbedescribedby
Thecostinthiscaseisdefinedas
Forcontinuous-timesystems,Bellman’sprincipleofoptimalitycanbeapplied,too.TheoptimalcostJ*(x0)5minJ(x0,u(t))willsatisfytheHamilton-Jacobi-BellmanEquation
Equations(3)and(7)arecalledtheoptimalityequationsofdynamicprogrammingwhicharethebasisforimplementationofdynamicprogramming.Intheabove,ifthefunctionFin
(1)or(5)andthecostfunctionJin
(2)or(6)areknown,thesolutionofu(k)becomesasimpleoptimizationproblem.Ifthesystemismodeledbylineardynamicsandthecostfunctiontobeminimizedisquadraticinthestateandcontrol,thentheoptimalcontrolisalinearfeedbackofthestates,wherethegainsareobtainedbysolvingastandardRiccatiequation[47].Ontheotherhand,ifthesystemismodeledbynonlineardynamicsorthecostfunctionisnonquadratic,theoptimalstatefeedbackcontrolwilldependuponsolutionstotheHamilton-Jacobi-Bellman(HJB)equation[48]whichisgenerallyanonlinearpartialdifferentialequationordifferenceequation.However,itisoftencomputationallyuntenabletoruntruedynamicprogrammingduetothebackwardnumericalprocessrequiredforitssolutions,i.e.,asaresultofthewell-known“curseofdimensionality”[16],[28].In[69],threecursesaredisplayedinresourcemanagementandcontrolproblemstoshowthecostfunctionJ,whichisthetheoreticalsolutionoftheHamilton-Jacobi-Bellmanequation,isverydifficulttoobtain,exceptforsystemssatisfyingsomeverygoodconditions.Overtheyears,progresshasbeenmadetocircumventthe“curseofdimensionality”bybuildingasystem,called“critic”,toapproximatethecostfunctionindynamicprogramming(cf.[10],[60],[61],[63],[70],[78],[92],[94],[95]).Theideaistoapproximatedynamicprogrammingsolutionsby