CNNWord格式.docx

资源描述

CNNWord格式.docx

《CNNWord格式.docx》由会员分享，可在线阅读，更多相关《CNNWord格式.docx（15页珍藏版）》请在冰豆网上搜索。

CNNWord格式.docx

Design

ACNNconsistsofaninputandanoutputlayer,aswellasmultiplehiddenlayers.ThehiddenlayersofaCNNtypicallyconsistofconvolutionallayers,poolinglayers,fullyconnectedlayersandnormalization（正规化）layers.

Convolutional

Convolutionallayersapplyaconvolutionoperationtotheinput,passingtheresulttothenextlayer.Theconvolutionemulates【模仿】theresponseofanindividualneurontovisualstimuli.[7]

Eachconvolutionalneuronprocessesdataonlyforitsreceptive【接纳的】field[clarificationneeded].TilingallowsCNNstotolerate

translation

oftheinputimage（e.g.translation,rotation【旋转】,perspectivedistortion【失真】）[clarificationneeded].

Although

fullyconnectedfeedforwardneuralnetworks

canbeusedtolearnfeaturesaswellasclassifydata,itisnotpracticaltoapplythisarchitecturetoimages.Averyhighnumberofneuronswouldbenecessary,eveninashallow（oppositeofdeep）architecture[citationneeded],duetotheverylargeinputsizesassociatedwithimages,whereeachpixelisarelevantdatapoint.Theconvolutionoperationbringsasolutiontothisproblemasitreducesthenumberoffreeparameters,allowingthenetworktobedeeperwithfewerparameters.[8]

Inotherwords,itresolvesthevanishing【消失】orexploding【爆炸】gradientsproblemintrainingtraditionalmulti-layerneuralnetworkswithmanylayersbyusing

backpropagation【反向传播】[citationneeded].

Pooling【池化】

Convolutionalnetworksmayincludelocalorglobalpoolinglayers[clarificationneeded],whichcombinetheoutputsofneuronclustersatonelayerintoasingleneuroninthenextlayer.[9][10]Forexample,maxpoolingusesthemaximumvaluefromeachofaclusterofneuronsatthepriorlayer.[11]Anotherexampleisaveragepooling,whichusestheaveragevaluefromeachofaclusterofneuronsatthepriorlayer[citationneeded].

Fullyconnected

Fullyconnectedlayersconnecteveryneuroninonelayertoeveryneuroninanotherlayer.Itisinprinciplethesameasthetraditionalmulti-layerperceptronneuralnetwork（MLP）.

Weights[edit]

CNNsshareweightsinconvolutionallayers,whichmeansthatthesamefilter【卷积核】（weightsbank[clarificationneeded]）isusedforeachreceptive【接纳】field[clarificationneeded]inthelayer;

thisreducesmemoryfootprintandimprovesperformance

Timedelayneuralnetworks

Timedelayneuralnetworkswereintroducedintheearly1980s.Theyconcentratedondevelopinganeuralnetworkarchitecturewhichcouldbeappliedtospeechsignalstime-invariantly.[12]CNNsuseasimilararchitecture,especiallythoseforimagerecognitionorclassificationtasks,sincethetilingofneuronoutputscanbedoneintimedstages,inamannerusefulforanalysisofimages.[13]

History

CNNdesignfollowsvisionprocessinginlivingorganisms[citationneeded].

Receptivefields

WorkbyHubelandWieselinthe1950sand1960sshowedthatcatandmonkeyvisualcortexescontainneuronsthatindividuallyrespondtosmallregionsofthevisualfield.Providedtheeyesarenotmoving,theregionofvisualspacewithinwhichvisualstimuliaffectthefiringofasingleneuronisknownasitsreceptivefield[citationneeded].Neighboringcellshavesimilarandoverlappingreceptivefields[citationneeded].Receptivefieldsizeandlocationvariessystematicallyacrossthecortextoformacompletemapofvisualspace[citationneeded].Thecortexineachhemisphererepresentsthecontralateralvisualfield[citationneeded].

Their1968paper[14]identifiedtwobasicvisualcelltypesinthebrain:

simplecells,whoseoutputismaximizedbystraightedgeshavingparticularorientationswithintheirreceptivefield

complexcells,whichhavelargerreceptivefields,whoseoutputisinsensitivetotheexactpositionoftheedgesinthefield.

Neocognitron[edit]

Theneocognitron[15]wasintroducedin1980.[11][16]Theneocognitrondoesnotrequireunitslocatedatmultiplenetworkpositionstohavethesametrainableweights.Thisideaappearsin1986inthebookversionoftheoriginalbackpropagationpaper[17]（Figure14）.Neocognitronsweredevelopedin1988fortemporalsignals.[clarificationneeded][18]Theirdesignwasimprovedin1998,[19]generalizedin2003[20]andsimplifiedinthesameyear.[21]

LeNet-5[edit]

LeNet-5,apioneering7-levelconvolutionalnetworkbyLeCunetal.[19]thatclassifiesdigits,wasappliedbyseveralbankstorecognisehand-writtennumbersonchecks（cheques）digitizedin32x32pixelimages.Theabilitytoprocesshigherresolutionimagesrequireslargerandmoreconvolutionallayers,sothistechniqueisconstrainedbytheavailabilityofcomputingresources.

Shift-invariantneuralnetwork

Similarly,ashiftinvariantneuralnetworkwasproposedforimagecharacterrecognitionin1988.[2][3]Thearchitectureandtrainingalgorithmweremodifiedin1991[22]andappliedformedicalimageprocessing[23]andautomaticdetectionofbreastcancerinmammograms.[24]

Adifferentconvolution-baseddesignwasproposedin1988[25]forapplicationtodecomposition【分解】ofone-dimensionalelectromyography【肌电图】=convolvedsignalsviade-convolution.Thisdesignwasmodifiedin1989tootherde-convolution-baseddesigns.[26][27]

Neuralabstractionpyramid【神经抽象金字塔】

Thefeed-forwardarchitectureofconvolutionalneuralnetworkswasextendedintheneuralabstractionpyramid[28]bylateralandfeedbackconnections.Theresultingrecurrentconvolutionalnetworkallowsfortheflexibleincorporationofcontextualinformationtoiterativelyresolvelocalambiguities.Incontrasttopreviousmodels,image-likeoutputsatthehighestresolutionweregenerated.

GPUimplementations[edit]

Followingthe2005paperthatestablishedthevalueofGPGPUformachinelearning,[29]severalpublicationsdescribedmoreefficientwaystotrainconvolutionalneuralnetworksusingGPUs.[30][31][32][33]In2011,theywererefinedandimplementedonaGPU,withimpressiveresults.[9]In2012,Ciresanetal.significantlyimprovedonthebestperformanceintheliteratureformultipleimagedatabases,includingtheMNISTdatabase,theNORBdatabase,theHWDB1.0dataset（Chinesecharacters）,theCIFAR10dataset（datasetof6000032x32labeledRGBimages）,[11]andtheImageNetdataset.[34]

Distinguishingfeatures[edit]

Whiletraditionalmultilayerperceptron（MLP）modelsweresuccessfullyusedforimagerecognition[examplesneeded],duetothefullconnectivitybetweennodestheysufferfromthecurseofdimensionality,andthusdonotscalewelltohigherresolutionimages.

CNNlayersarrangedin3dimensions

Forexample,inCIFAR-10,imagesareonlyofsize32x32x3（32wide,32high,3colorchannels）,soasinglefullyconnectedneuroninafirsthiddenlayerofaregularneuralnetworkwouldhave32*32*3=3,072weights.A200x200image,however,wouldleadtoneuronsthathave200*200*3=120,000weights.

Also,suchnetworkarchitecturedoesnottakeintoaccountthespatial【空间的结构】structureofdata,treatinginputpixelswhicharefarapartthesameaspixelsthatareclosetogether[citationneeded].Thus,fullconnectivityofneuronsiswastefulforthepurposeofimagerecognition[clarificationneeded].

Convolutionalneuralnetworksarebiologicallyinspiredvariantsofmultilayerperceptrons,designedtoemulate【模仿】thebehaviourofavisualcortex[citationneeded].Thesemodelsmitigate【缓和】thechallengesposedbytheMLParchitecturebyexploitingthestrongspatiallylocalcorrelationpresentinnaturalimages.AsopposedtoMLPs,CNNshavethefollowingdistinguishingfeatures:

3Dvolumesofneurons.ThelayersofaCNNhaveneuronsarrangedin3dimensions:

width,heightanddepth.Theneuronsinsidealayerareconnectedtoonlyasmallregionofthelayerbeforeit,calledareceptivefield.Distincttypesoflayers,bothlocallyandcompletelyconnected,arestackedtoformaCNNarchitecture.

Localconnectivity:

followingtheconceptofreceptivefields,CNNsexploitspatiallocalitybyenforcingalocalconnectivitypatternbetweenneuronsofadjacentlayers.Thearchitecturethusensuresthatthelearnt"

filters"

producethestrongestresponsetoaspatiallylocalinputpattern.Stackingmanysuchlayersleadstonon-linear"

thatbecomeincreasingly"

global"

（i.e.responsivetoalargerregionofpixelspace）.Thisallowsthenetworktofirstcreaterepresentationsofsmallpartsoftheinput,thenfromthemassemblerepresentationsoflargerareas.

Sharedweights:

InCNNs,eachfilterisreplicatedacrosstheentirevisualfield.Thesereplicatedunitssha

展开阅读全文