深度学习大数据分析中英文外文文献翻译.docx
《深度学习大数据分析中英文外文文献翻译.docx》由会员分享,可在线阅读,更多相关《深度学习大数据分析中英文外文文献翻译.docx(18页珍藏版)》请在冰豆网上搜索。
![深度学习大数据分析中英文外文文献翻译.docx](https://file1.bdocx.com/fileroot1/2022-10/4/e3d90e06-1b42-458a-ad45-f0f7d0979dd2/e3d90e06-1b42-458a-ad45-f0f7d0979dd21.gif)
本科毕业设计(论文)
中英文对照翻译
(此文档为word格式,下载后您可任意修改编辑!
)
标题:
PrototypingaGPGPUNeuralNetworkforDeep-LearningBigDataAnalysis
作者:
AlcidesFonseca, BrunoCabral
期刊:
BigDataResearch,卷8:
50-56页年份:
2017
原文
PrototypingaGPGPUNeuralNetworkforDeep-LearningBigDataAnalysis
AlcidesFonseca, BrunoCabral
Abstract
BigDataconcernswithlarge-volumecomplexgrowingdata.Giventhefastdevelopmentofdatastorageandnetwork,organizationsarecollectinglargeever-growingdatasetsthatcanhaveusefulinformation.Inordertoextractinformationfromthesedatasetswithinusefultime,itisimportanttousedistributedandparallelalgorithms.Onecommonusageofbigdataismachinelearning,inwhichcollecteddataisusedtopredictfuturebehavior.Deep-LearningusingArtificialNeuralNetworksisoneofthepopularmethodsforextractinginformationfromcomplexdatasets.Deep-learningiscapableofmorecreatingcomplexmodelsthantraditionalprobabilisticmachinelearningtechniques.
Thisworkpresentsastep-by-stepguideonhowtoprototypeaDeep-LearningapplicationthatexecutesbothonGPUandCPUclusters.PythonandRedisarethecoresupportingtoolsofthisguide.ThistutorialwillallowthereadertounderstandthebasicsofbuildingadistributedhighperformanceGPUapplicationinafewhours.Sincewedonotdependonanydeep-learningapplicationorframework—weuselow-levelbuilding
blocks—thistutorialcanbeadjustedforanyotherparallelalgorithmthereadermightwanttoprototypeonBigData.Finally,wewilldiscusshowtomovefromaprototypetoafullyblownproductionapplication.
Keywords:
Big-data; Deep-learning; Prototyping; GPGPU; Cluster; Parallelprogramming
Introduction
DeepLearningreferstotheusageofArtificialNeuralNetworks(ANNorNN)withseveralhiddenlayersusedfordatawithahighdimensionality.AcommonexampleandbenchmarkforDeepLearningisimageclassificationfromtheImageNetdataset.ANNscanbeusedforclassificationtasks,withseveralapplicationsinindustry,businessandscience.Examplesofapplicationsincludecharacterrecognitioninscanneddocuments,predictingbankruptcyorhealthcomplications.AutonomousdrivingalsomakesheavyuseofANNs.AnANNbeginswithrandomweights,practicallydecidingeverythingatrandom.BytrainingtheANNwithseveralexistinginstancesoftheproblem,onecanevaluatetheerrorproduced.Weightsarethenadjusted,takingintoaccountifitoverlyorunderlyestimatedthefinalvalue.
Inordertopredictvalues,ANNsarebuiltconnectinglayersofneurons.ANNsusethefirstlayerofneuronsforeachinputfeature,andthe
finallayerfortheclassificationoutput.Fig.1showsanexampleofanANNwithfourinputneurons,fourneuronsinthehiddenlayerandtwooutputneurons.Allneuronsinonelayerareconnectedtoalltheneuronsinthefollowinglayer.
Whenthenumberoffeaturesincreases(highdimensionality),thenumberofneuronsinthehiddenlayersincreasesaswell,inordertocompensateforthepossibleinteractionsofinputneurons.However,aruleofthumbistouseonlyonehiddenlayerwiththesamenumberofhiddenneuronsasthereareinputneurons.ThesecondscalabilityissuewithANNsisthatforahighaccuracy,theyhavetobetrainedwithalargedataset.Typically,toachieveagoodaccuracyscore,thenumberofinstancesshouldbethreeordersofmagnitudehigherthanthenumberoffeatures.Thus,wereachapointinwhichweneedtotrainanANNoverseveraliterations,usingahighnumberoffeaturesandinstances.Intheseconditions,traininganANNbecomesacomputationallyintensiveoperation,highlydemandingintermsofprocessing,memoryanddiskusage.Astheamountofdataavailablefortraininggoesaboveaterabyte,itbecomesBigDataproblem.
ThesolutionforBigDataprocessingistodistributethecomputationacrossdifferentmachines,splittingdataamongthemandmergingresultsafterwards.InMap-Reduceapproaches,itispossibletodividethecomputationintoindependentsub-problemsthatcanbecombinedto
produceafinalresult.HadoopandSparkarethemostusedframeworksforBigDataprocessing.
ANNsaredescribedbythefollowingcharacteristics:
layout(thenumberoflayersandneuronsoneachlayer)andtheweightsofconnectionsbetweenneurons(thesecondattributeisdependentonthefirst).WhentraininganANNforaspecificproblemdataset,theweightsarebeingadjustedtominimizetheoutputerror.BecausethepredictionofANNscanbedescribedasmatrixoperations(wearemultiplyingthesameweightstoaeachrowoffeaturesofprobleminstances),graphicalprocessingunits(GPUs)areusuallyagoodsolutionforimprovingperformanceandreducetrainingtimes.GPUsweredesignedtoperformmatrixoperationsinthecontextofvideoprocessing,buthave