基于深度强化学习的flappybird.docx

上传人:b****4 文档编号:3073932 上传时间:2022-11-17 格式:DOCX 页数:15 大小:820.78KB
下载 相关 举报
基于深度强化学习的flappybird.docx_第1页
第1页 / 共15页
基于深度强化学习的flappybird.docx_第2页
第2页 / 共15页
基于深度强化学习的flappybird.docx_第3页
第3页 / 共15页
基于深度强化学习的flappybird.docx_第4页
第4页 / 共15页
基于深度强化学习的flappybird.docx_第5页
第5页 / 共15页
点击查看更多>>
下载资源
资源描述

基于深度强化学习的flappybird.docx

《基于深度强化学习的flappybird.docx》由会员分享,可在线阅读,更多相关《基于深度强化学习的flappybird.docx(15页珍藏版)》请在冰豆网上搜索。

基于深度强化学习的flappybird.docx

基于深度强化学习的flappybird

SHANGHAIJIAOTONGUNIVERSITY

 

ProjectTitle:

PlayingtheGameofFlappyBirdwithDeepReinforcementLearning

GroupNumber:

G-07

GroupMembers:

WangWenqing116032910080

GaoXiaoning116032910032

QianChen116032910073

 

PlayingtheGameofFlappyBirdwithDeepReinforcementLearning

Abstract

LettingmachineplaygameshasbeenoneofthepopulartopicsinAItoday.Usinggametheoryandsearchalgorithmstoplaygamesrequiresspecificdomainknowledge,lackingscalability.Inthisproject,weutilizeaconvolutionalneuralnetworktorepresenttheenvironmentofgames,updatingitsparameterswithQ-learning,areinforcementlearningalgorithm.WecallthisoverallalgorithmasdeepreinforcementlearningorDeepQ-learningNetwork(DQN).Moreover,weonlyusetherawimagesofthegameofflappybirdastheinputofDQN,whichguaranteesthescalabilityforothergames.Aftertrainingwithsometricks,DQNcangreatlyoutperformhumanbeings.

1Introduction

Flappybirdisapopulargameintheworldrecentyears.Thegoalofplayersisguidingthebirdonscreentopassthegapconstructedbytwopipesbytappingscreen.Iftheplayertapthescreen,thebirdwilljumpup,andiftheplayerdonothing,thebirdwillfalldownataconstantrate.Thegamewillbeoverwhenthebirdcrashonpipesorground,whilethescoreswillbeaddedonewhenthebirdpassthroughthegap.InFigure1,therearethreedifferentstateofbird.Figure1(a)representsthenormalflightstate,(b)representsthecrashstate,(c)representsthepassingstate.

(a)(b)(c)

Figure1:

(a)normalflightstate(b)crashstate(c)passingstate

OurgoalinthispaperistodesignanagenttoplayFlappybirdautomaticallywiththesameinputcomparingtohumanplayer,whichmeansthatweuserawimagesandrewardstoteachouragenttolearnhowtoplaythisgame.Inspiredby[1],weproposeadeepreinforcementlearningarchitecturetolearnandplaythisgame.

Recentyears,ahugeamountofworkhasbeendoneondeeplearningincomputervision[6].Deeplearningextractshighdimensionfeaturesfromrawimages.Therefore,itisnaturetoaskwhetherthedeeplearningcanbeusedinreinforcementlearning.However,therearefourchallengesinusingdeeplearning.Firstly,mostsuccessfuldeeplearningapplicationstodatehaverequiredlargeamountsofhand-labelledtrainingdata.RLalgorithms,ontheotherhand,mustbeabletolearnfromascalarrewardsignalthatisfrequentlysparse,noisyanddelayed.Secondly,thedelaybetweenactionsandresultingrewards,whichcanbethousandsoftimestepslong,seemsparticularlydauntingwhencomparedtothedirectassociationbetweeninputsandtargetsfoundinsupervisedlearning.Thethirdissueisthatmostdeeplearningalgorithmsassumethedatasamplestobeindependent,whileinreinforcementlearningonetypicallyencounterssequencesofhighlycorrelatedstates.Furthermore,inRLthedatadistributionchangesasthealgorithmlearnsnewbehaviors,whichcanbeproblematicfordeeplearningmethodsthatassumeafixedunderlyingdistribution.

ThispaperwilldemonstratethatusingConvolutionalNeuralNetwork(CNN)canovercomethosechallengementionedaboveandlearnsuccessfulcontrolpolicesfromrawimagesdatainthegameFlappybird.ThisnetworkistrainedwithavariantoftheQ-learningalgorithm[6].ByusingDeepQ-learningNetwork(DQN),weconstructtheagenttomakerightdecisionsonthegameflappybirdbarelyaccordingtoconsequentrawimages.

2DeepQ-learningNetwork

Recentbreakthroughsincomputervisionhavereliedonefficientlytrainingdeepneuralnetworksonverylargetrainingsets.Byfeedingsufficientdataintodeepneuralnetworks,itisoftenpossibletolearnbetterrepresentationsthanhandcraftedfeatures[2][3].Thesesuccessesmotivateustoconnectareinforcementlearningalgorithmtoadeepneuralnetwork,whichoperatesdirectlyonrawimagesandefficientlyupdateparametersbyusingstochasticgradientdescent.

Inthefollowingsection,wedescribetheDeepQ-learningNetworkalgorithm(DQN)andhowitsmodelisparameterized.

2.1Q-learning

2.1.1ReinforcementLearningProblem

Q-learningisaspecificalgorithmofreinforcementlearning(RL).AsFigure2show,anagentinteractswithitsenvironmentindiscretetimesteps.Ateachtimet,theagentreceivesanstate

andareward

.Itthenchoosesanaction

fromthesetofactionsavailable,whichissubsequentlysenttotheenvironment.Theenvironmentmovestoanewstate

andthereward

associatedwiththetransition

isdetermined[4].

Figure2:

TraditionalReinforcementLearningscenario

Thegoalofanagentistocollectasmuchrewardaspossible.Theagentcanchooseanyactionasafunctionofthehistoryanditcanevenrandomizeitsactionselection.Notethatinorderto

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 医药卫生 > 基础医学

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1