CNNWord格式.docx
《CNNWord格式.docx》由会员分享,可在线阅读,更多相关《CNNWord格式.docx(15页珍藏版)》请在冰豆网上搜索。
Design
ACNNconsistsofaninputandanoutputlayer,aswellasmultiplehiddenlayers.ThehiddenlayersofaCNNtypicallyconsistofconvolutionallayers,poolinglayers,fullyconnectedlayersandnormalization(正规化)layers.
Convolutional
Convolutionallayersapplyaconvolutionoperationtotheinput,passingtheresulttothenextlayer.Theconvolutionemulates【模仿】theresponseofanindividualneurontovisualstimuli.[7]
Eachconvolutionalneuronprocessesdataonlyforitsreceptive【接纳的】field[clarificationneeded].TilingallowsCNNstotolerate
translation
oftheinputimage(e.g.translation,rotation【旋转】,perspectivedistortion【失真】)[clarificationneeded].
Although
fullyconnectedfeedforwardneuralnetworks
canbeusedtolearnfeaturesaswellasclassifydata,itisnotpracticaltoapplythisarchitecturetoimages.Averyhighnumberofneuronswouldbenecessary,eveninashallow(oppositeofdeep)architecture[citationneeded],duetotheverylargeinputsizesassociatedwithimages,whereeachpixelisarelevantdatapoint.Theconvolutionoperationbringsasolutiontothisproblemasitreducesthenumberoffreeparameters,allowingthenetworktobedeeperwithfewerparameters.[8]
Inotherwords,itresolvesthevanishing【消失】orexploding【爆炸】gradientsproblemintrainingtraditionalmulti-layerneuralnetworkswithmanylayersbyusing
backpropagation【反向传播】[citationneeded].
Pooling【池化】
Convolutionalnetworksmayincludelocalorglobalpoolinglayers[clarificationneeded],whichcombinetheoutputsofneuronclustersatonelayerintoasingleneuroninthenextlayer.[9][10]Forexample,maxpoolingusesthemaximumvaluefromeachofaclusterofneuronsatthepriorlayer.[11]Anotherexampleisaveragepooling,whichusestheaveragevaluefromeachofaclusterofneuronsatthepriorlayer[citationneeded].
Fullyconnected
Fullyconnectedlayersconnecteveryneuroninonelayertoeveryneuroninanotherlayer.Itisinprinciplethesameasthetraditionalmulti-layerperceptronneuralnetwork(MLP).
Weights[edit]
CNNsshareweightsinconvolutionallayers,whichmeansthatthesamefilter【卷积核】(weightsbank[clarificationneeded])isusedforeachreceptive【接纳】field[clarificationneeded]inthelayer;
thisreducesmemoryfootprintandimprovesperformance
Timedelayneuralnetworks
Timedelayneuralnetworkswereintroducedintheearly1980s.Theyconcentratedondevelopinganeuralnetworkarchitecturewhichcouldbeappliedtospeechsignalstime-invariantly.[12]CNNsuseasimilararchitecture,especiallythoseforimagerecognitionorclassificationtasks,sincethetilingofneuronoutputscanbedoneintimedstages,inamannerusefulforanalysisofimages.[13]
History
CNNdesignfollowsvisionprocessinginlivingorganisms[citationneeded].
Receptivefields
WorkbyHubelandWieselinthe1950sand1960sshowedthatcatandmonkeyvisualcortexescontainneuronsthatindividuallyrespondtosmallregionsofthevisualfield.Providedtheeyesarenotmoving,theregionofvisualspacewithinwhichvisualstimuliaffectthefiringofasingleneuronisknownasitsreceptivefield[citationneeded].Neighboringcellshavesimilarandoverlappingreceptivefields[citationneeded].Receptivefieldsizeandlocationvariessystematicallyacrossthecortextoformacompletemapofvisualspace[citationneeded].Thecortexineachhemisphererepresentsthecontralateralvisualfield[citationneeded].
Their1968paper[14]identifiedtwobasicvisualcelltypesinthebrain:
simplecells,whoseoutputismaximizedbystraightedgeshavingparticularorientationswithintheirreceptivefield
complexcells,whichhavelargerreceptivefields,whoseoutputisinsensitivetotheexactpositionoftheedgesinthefield.
Neocognitron[edit]
Theneocognitron[15]wasintroducedin1980.[11][16]Theneocognitrondoesnotrequireunitslocatedatmultiplenetworkpositionstohavethesametrainableweights.Thisideaappearsin1986inthebookversionoftheoriginalbackpropagationpaper[17](Figure14).Neocognitronsweredevelopedin1988fortemporalsignals.[clarificationneeded][18]Theirdesignwasimprovedin1998,[19]generalizedin2003[20]andsimplifiedinthesameyear.[21]
LeNet-5[edit]
LeNet-5,apioneering7-levelconvolutionalnetworkbyLeCunetal.[19]thatclassifiesdigits,wasappliedbyseveralbankstorecognisehand-writtennumbersonchecks(cheques)digitizedin32x32pixelimages.Theabilitytoprocesshigherresolutionimagesrequireslargerandmoreconvolutionallayers,sothistechniqueisconstrainedbytheavailabilityofcomputingresources.
Shift-invariantneuralnetwork
Similarly,ashiftinvariantneuralnetworkwasproposedforimagecharacterrecognitionin1988.[2][3]Thearchitectureandtrainingalgorithmweremodifiedin1991[22]andappliedformedicalimageprocessing[23]andautomaticdetectionofbreastcancerinmammograms.[24]
Adifferentconvolution-baseddesignwasproposedin1988[25]forapplicationtodecomposition【分解】ofone-dimensionalelectromyography【肌电图】=convolvedsignalsviade-convolution.Thisdesignwasmodifiedin1989tootherde-convolution-baseddesigns.[26][27]
Neuralabstractionpyramid【神经抽象金字塔】
Thefeed-forwardarchitectureofconvolutionalneuralnetworkswasextendedintheneuralabstractionpyramid[28]bylateralandfeedbackconnections.Theresultingrecurrentconvolutionalnetworkallowsfortheflexibleincorporationofcontextualinformationtoiterativelyresolvelocalambiguities.Incontrasttopreviousmodels,image-likeoutputsatthehighestresolutionweregenerated.
GPUimplementations[edit]
Followingthe2005paperthatestablishedthevalueofGPGPUformachinelearning,[29]severalpublicationsdescribedmoreefficientwaystotrainconvolutionalneuralnetworksusingGPUs.[30][31][32][33]In2011,theywererefinedandimplementedonaGPU,withimpressiveresults.[9]In2012,Ciresanetal.significantlyimprovedonthebestperformanceintheliteratureformultipleimagedatabases,includingtheMNISTdatabase,theNORBdatabase,theHWDB1.0dataset(Chinesecharacters),theCIFAR10dataset(datasetof6000032x32labeledRGBimages),[11]andtheImageNetdataset.[34]
Distinguishingfeatures[edit]
Whiletraditionalmultilayerperceptron(MLP)modelsweresuccessfullyusedforimagerecognition[examplesneeded],duetothefullconnectivitybetweennodestheysufferfromthecurseofdimensionality,andthusdonotscalewelltohigherresolutionimages.
CNNlayersarrangedin3dimensions
Forexample,inCIFAR-10,imagesareonlyofsize32x32x3(32wide,32high,3colorchannels),soasinglefullyconnectedneuroninafirsthiddenlayerofaregularneuralnetworkwouldhave32*32*3=3,072weights.A200x200image,however,wouldleadtoneuronsthathave200*200*3=120,000weights.
Also,suchnetworkarchitecturedoesnottakeintoaccountthespatial【空间的结构】structureofdata,treatinginputpixelswhicharefarapartthesameaspixelsthatareclosetogether[citationneeded].Thus,fullconnectivityofneuronsiswastefulforthepurposeofimagerecognition[clarificationneeded].
Convolutionalneuralnetworksarebiologicallyinspiredvariantsofmultilayerperceptrons,designedtoemulate【模仿】thebehaviourofavisualcortex[citationneeded].Thesemodelsmitigate【缓和】thechallengesposedbytheMLParchitecturebyexploitingthestrongspatiallylocalcorrelationpresentinnaturalimages.AsopposedtoMLPs,CNNshavethefollowingdistinguishingfeatures:
3Dvolumesofneurons.ThelayersofaCNNhaveneuronsarrangedin3dimensions:
width,heightanddepth.Theneuronsinsidealayerareconnectedtoonlyasmallregionofthelayerbeforeit,calledareceptivefield.Distincttypesoflayers,bothlocallyandcompletelyconnected,arestackedtoformaCNNarchitecture.
Localconnectivity:
followingtheconceptofreceptivefields,CNNsexploitspatiallocalitybyenforcingalocalconnectivitypatternbetweenneuronsofadjacentlayers.Thearchitecturethusensuresthatthelearnt"
filters"
producethestrongestresponsetoaspatiallylocalinputpattern.Stackingmanysuchlayersleadstonon-linear"
thatbecomeincreasingly"
global"
(i.e.responsivetoalargerregionofpixelspace).Thisallowsthenetworktofirstcreaterepresentationsofsmallpartsoftheinput,thenfromthemassemblerepresentationsoflargerareas.
Sharedweights:
InCNNs,eachfilterisreplicatedacrosstheentirevisualfield.Thesereplicatedunitssha