深度学习大数据分析中英文外文文献翻译Word格式.docx
《深度学习大数据分析中英文外文文献翻译Word格式.docx》由会员分享,可在线阅读,更多相关《深度学习大数据分析中英文外文文献翻译Word格式.docx(18页珍藏版)》请在冰豆网上搜索。
)
标题:
PrototypingaGPGPUNeuralNetworkforDeep-LearningBigDataAnalysis
作者:
AlcidesFonseca, BrunoCabral
期刊:
BigDataResearch,卷8:
50-56页年份:
2017
原文
Abstract
BigDataconcernswithlarge-volumecomplexgrowingdata.Giventhefastdevelopmentofdatastorageandnetwork,organizationsarecollectinglargeever-growingdatasetsthatcanhaveusefulinformation.Inordertoextractinformationfromthesedatasetswithinusefultime,itisimportanttousedistributedandparallelalgorithms.Onecommonusageofbigdataismachinelearning,inwhichcollecteddataisusedtopredictfuturebehavior.Deep-LearningusingArtificialNeuralNetworksisoneofthepopularmethodsforextractinginformationfromcomplexdatasets.Deep-learningiscapableofmorecreatingcomplexmodelsthantraditionalprobabilisticmachinelearningtechniques.
Thisworkpresentsastep-by-stepguideonhowtoprototypeaDeep-LearningapplicationthatexecutesbothonGPUandCPUclusters.PythonandRedisarethecoresupportingtoolsofthisguide.ThistutorialwillallowthereadertounderstandthebasicsofbuildingadistributedhighperformanceGPUapplicationinafewhours.Sincewedonotdependonanydeep-learningapplicationorframework—weuselow-levelbuilding
blocks—thistutorialcanbeadjustedforanyotherparallelalgorithmthereadermightwanttoprototypeonBigData.Finally,wewilldiscusshowtomovefromaprototypetoafullyblownproductionapplication.
Keywords:
Big-data;
Deep-learning;
Prototyping;
GPGPU;
Cluster;
Parallelprogramming
Introduction
DeepLearningreferstotheusageofArtificialNeuralNetworks(ANNorNN)withseveralhiddenlayersusedfordatawithahighdimensionality.AcommonexampleandbenchmarkforDeepLearningisimageclassificationfromtheImageNetdataset.ANNscanbeusedforclassificationtasks,withseveralapplicationsinindustry,businessandscience.Examplesofapplicationsincludecharacterrecognitioninscanneddocuments,predictingbankruptcyorhealthcomplications.AutonomousdrivingalsomakesheavyuseofANNs.AnANNbeginswithrandomweights,practicallydecidingeverythingatrandom.BytrainingtheANNwithseveralexistinginstancesoftheproblem,onecanevaluatetheerrorproduced.Weightsarethenadjusted,takingintoaccountifitoverlyorunderlyestimatedthefinalvalue.
Inordertopredictvalues,ANNsarebuiltconnectinglayersofneurons.ANNsusethefirstlayerofneuronsforeachinputfeature,andthe
finallayerfortheclassificationoutput.Fig.1showsanexampleofanANNwithfourinputneurons,fourneuronsinthehiddenlayerandtwooutputneurons.Allneuronsinonelayerareconnectedtoalltheneuronsinthefollowinglayer.
Whenthenumberoffeaturesincreases(highdimensionality),thenumberofneuronsinthehiddenlayersincreasesaswell,inordertocompensateforthepossibleinteractionsofinputneurons.However,aruleofthumbistouseonlyonehiddenlayerwiththesamenumberofhiddenneuronsasthereareinputneurons.ThesecondscalabilityissuewithANNsisthatforahighaccuracy,theyhavetobetrainedwithalargedataset.Typically,toachieveagoodaccuracyscore,thenumberofinstancesshouldbethreeordersofmagnitudehigherthanthenumberoffeatures.Thus,wereachapointinwhichweneedtotrainanANNoverseveraliterations,usingahighnumberoffeaturesandinstances.Intheseconditions,traininganANNbecomesacomputationallyintensiveoperation,highlydemandingintermsofprocessing,memoryanddiskusage.Astheamountofdataavailablefortraininggoesaboveaterabyte,itbecomesBigDataproblem.
ThesolutionforBigDataprocessingistodistributethecomputationacrossdifferentmachines,splittingdataamongthemandmergingresultsafterwards.InMap-Reduceapproaches,itispossibletodividethecomputationintoindependentsub-problemsthatcanbecombinedto
produceafinalresult.HadoopandSparkarethemostusedframeworksforBigDataprocessing.
ANNsaredescribedbythefollowingcharacteristics:
layout(thenumberoflayersandneuronsoneachlayer)andtheweightsofconnectionsbetweenneurons(thesecondattributeisdependentonthefirst).WhentraininganANNforaspecificproblemdataset,theweightsarebeingadjustedtominimizetheoutputerror.BecausethepredictionofANNscanbedescribedasmatrixoperations(wearemultiplyingthesameweightstoaeachrowoffeaturesofprobleminstances),graphicalprocessingunits(GPUs)areusuallyagoodsolutionforimprovingperformanceandreducetrainingtimes.GPUsweredesignedtoperformmatrixoperationsinthecontextofvideoprocessing,buthave