近似动态规划相关的外文文献及翻译.docx

资源描述

近似动态规划相关的外文文献及翻译.docx

《近似动态规划相关的外文文献及翻译.docx》由会员分享，可在线阅读，更多相关《近似动态规划相关的外文文献及翻译.docx（15页珍藏版）》请在冰豆网上搜索。

近似动态规划相关的外文文献及翻译.docx

近似动态规划相关的外文文献及翻译

外文文献：

AdaptiveDynamicProgramming:

AnIntroduction

Abstract:

Inthisarticle,weintroducesomerecentresearchtrendswithinthefieldofadaptive/approximatedynamicprogramming（ADP）,includingthevariationsonthestructureofADPschemes,thedevelopmentofADPalgorithmsandapplicationsofADPschemes.ForADPalgorithms,thepointoffocusisthatiterativealgorithmsofADPcanbesortedintotwoclasses:

oneclassistheiterativealgorithmwithinitialstablepolicy;theotheristheonewithouttherequirementofinitialstablepolicy.Itisgenerallybelievedthatthelatteronehaslesscomputationatthecostofmissingtheguaranteeofsystemstabilityduringiterationprocess.Inaddition,manyrecentpapershaveprovidedconvergenceanalysisassociatedwiththealgorithmsdeveloped.Furthermore,wepointoutsometopicsforfuturestudies.

Introduction

Asiswellknown,therearemanymethodsfordesigningstablecontrolfornonlinearsystems.However,stabilityisonlyabareminimumrequirementinasystemdesign.Ensuringoptimalityguaranteesthestabilityofthenonlinearsystem.Dynamicprogrammingisaveryusefultoolinsolvingoptimizationandoptimalcontrolproblemsbyemployingtheprincipleofoptimality.In[16],theprincipleofoptimalityisexpressedas:

“Anoptimalpolicyhasthepropertythatwhatevertheinitialstateandinitialdecisionare,theremainingdecisionsmustconstituteanoptimalpolicywithregardtothestateresultingfromthefirstdecision.”Thereareseveralspectrumsaboutthedynamicprogramming.Onecanconsiderdiscrete-timesystemsorcontinuous-timesystems,linearsystemsornonlinearsystems,time-invariantsystemsortime-varyingsystems,deterministicsystemsorstochasticsystems,etc.

Wefirsttakealookatnonlineardiscrete-time（timevarying）dynamical（deterministic）systems.Time-varyingnonlinearsystemscovermostoftheapplicationareasanddiscrete-timeisthebasicconsiderationfordigitalcomputation.Supposethatoneisgivenadiscrete-timenonlinear（timevarying）dynamicalsystem

where

representsthestatevectorofthesystemand

denotesthecontrolactionandFisthesystemfunction.Supposethatoneassociateswiththissystemtheperformanceindex（orcost）

whereUiscalledtheutilityfunctionandgisthediscountfactorwith0,g#1.NotethatthefunctionJisdependentontheinitialtimeiandtheinitialstatex（i）,anditisreferredtoasthecost-to-goofstatex（i）.Theobjectiveofdynamicprogrammingproblemistochooseacontrolsequenceu（k）,k5i,i11,c,sothatthefunctionJ（i.e.,thecost）in

（2）isminimized.AccordingtoBellman,theoptimalcostfromtimekisequalto

Theoptimalcontrolu*1k2attimekistheu1k2whichachievesthisminimum,i.e.,

Equation（3）istheprincipleofoptimalityfordiscrete-timesystems.Itsimportanceliesinthefactthatitallowsonetooptimizeoveronlyonecontrolvectoratatimebyworkingbackwardintime.

Innonlinearcontinuous-timecase,thesystemcanbedescribedby

Thecostinthiscaseisdefinedas

Forcontinuous-timesystems,Bellman’sprincipleofoptimalitycanbeapplied,too.TheoptimalcostJ*（x0）5minJ（x0,u（t））willsatisfytheHamilton-Jacobi-BellmanEquation

Equations（3）and（7）arecalledtheoptimalityequationsofdynamicprogrammingwhicharethebasisforimplementationofdynamicprogramming.Intheabove,ifthefunctionFin

（1）or（5）andthecostfunctionJin

（2）or（6）areknown,thesolutionofu（k）becomesasimpleoptimizationproblem.Ifthesystemismodeledbylineardynamicsandthecostfunctiontobeminimizedisquadraticinthestateandcontrol,thentheoptimalcontrolisalinearfeedbackofthestates,wherethegainsareobtainedbysolvingastandardRiccatiequation[47].Ontheotherhand,ifthesystemismodeledbynonlineardynamicsorthecostfunctionisnonquadratic,theoptimalstatefeedbackcontrolwilldependuponsolutionstotheHamilton-Jacobi-Bellman（HJB）equation[48]whichisgenerallyanonlinearpartialdifferentialequationordifferenceequation.However,itisoftencomputationallyuntenabletoruntruedynamicprogrammingduetothebackwardnumericalprocessrequiredforitssolutions,i.e.,asaresultofthewell-known“curseofdimensionality”[16],[28].In[69],threecursesaredisplayedinresourcemanagementandcontrolproblemstoshowthecostfunctionJ,whichisthetheoreticalsolutionoftheHamilton-Jacobi-Bellmanequation,isverydifficulttoobtain,exceptforsystemssatisfyingsomeverygoodconditions.Overtheyears,progresshasbeenmadetocircumventthe“curseofdimensionality”bybuildingasystem,called“critic”,toapproximatethecostfunctionindynamicprogramming（cf.[10],[60],[61],[63],[70],[78],[92],[94],[95]）.Theideaistoapproximatedynamicprogrammingsolutionsby

展开阅读全文