深度学习大数据分析中英文外文文献翻译.docx

资源描述

深度学习大数据分析中英文外文文献翻译.docx

《深度学习大数据分析中英文外文文献翻译.docx》由会员分享，可在线阅读，更多相关《深度学习大数据分析中英文外文文献翻译.docx（18页珍藏版）》请在冰豆网上搜索。

深度学习大数据分析中英文外文文献翻译.docx

本科毕业设计（论文）

中英文对照翻译

（此文档为word格式,下载后您可任意修改编辑!

）

标题：

PrototypingaGPGPUNeuralNetworkforDeep-LearningBigDataAnalysis

作者：

AlcidesFonseca, BrunoCabral

期刊：

BigDataResearch，卷8：

50-56页年份：

2017

原文

PrototypingaGPGPUNeuralNetworkforDeep-LearningBigDataAnalysis

AlcidesFonseca, BrunoCabral

Abstract

BigDataconcernswithlarge-volumecomplexgrowingdata.Giventhefastdevelopmentofdatastorageandnetwork,organizationsarecollectinglargeever-growingdatasetsthatcanhaveusefulinformation.Inordertoextractinformationfromthesedatasetswithinusefultime,itisimportanttousedistributedandparallelalgorithms.Onecommonusageofbigdataismachinelearning,inwhichcollecteddataisusedtopredictfuturebehavior.Deep-LearningusingArtificialNeuralNetworksisoneofthepopularmethodsforextractinginformationfromcomplexdatasets.Deep-learningiscapableofmorecreatingcomplexmodelsthantraditionalprobabilisticmachinelearningtechniques.

Thisworkpresentsastep-by-stepguideonhowtoprototypeaDeep-LearningapplicationthatexecutesbothonGPUandCPUclusters.PythonandRedisarethecoresupportingtoolsofthisguide.ThistutorialwillallowthereadertounderstandthebasicsofbuildingadistributedhighperformanceGPUapplicationinafewhours.Sincewedonotdependonanydeep-learningapplicationorframework—weuselow-levelbuilding

blocks—thistutorialcanbeadjustedforanyotherparallelalgorithmthereadermightwanttoprototypeonBigData.Finally,wewilldiscusshowtomovefromaprototypetoafullyblownproductionapplication.

Keywords:

Big-data; Deep-learning; Prototyping; GPGPU; Cluster; Parallelprogramming

Introduction

DeepLearningreferstotheusageofArtificialNeuralNetworks（ANNorNN）withseveralhiddenlayersusedfordatawithahighdimensionality.AcommonexampleandbenchmarkforDeepLearningisimageclassificationfromtheImageNetdataset.ANNscanbeusedforclassificationtasks,withseveralapplicationsinindustry,businessandscience.Examplesofapplicationsincludecharacterrecognitioninscanneddocuments,predictingbankruptcyorhealthcomplications.AutonomousdrivingalsomakesheavyuseofANNs.AnANNbeginswithrandomweights,practicallydecidingeverythingatrandom.BytrainingtheANNwithseveralexistinginstancesoftheproblem,onecanevaluatetheerrorproduced.Weightsarethenadjusted,takingintoaccountifitoverlyorunderlyestimatedthefinalvalue.

Inordertopredictvalues,ANNsarebuiltconnectinglayersofneurons.ANNsusethefirstlayerofneuronsforeachinputfeature,andthe

finallayerfortheclassificationoutput.Fig.1showsanexampleofanANNwithfourinputneurons,fourneuronsinthehiddenlayerandtwooutputneurons.Allneuronsinonelayerareconnectedtoalltheneuronsinthefollowinglayer.

Whenthenumberoffeaturesincreases（highdimensionality）,thenumberofneuronsinthehiddenlayersincreasesaswell,inordertocompensateforthepossibleinteractionsofinputneurons.However,aruleofthumbistouseonlyonehiddenlayerwiththesamenumberofhiddenneuronsasthereareinputneurons.ThesecondscalabilityissuewithANNsisthatforahighaccuracy,theyhavetobetrainedwithalargedataset.Typically,toachieveagoodaccuracyscore,thenumberofinstancesshouldbethreeordersofmagnitudehigherthanthenumberoffeatures.Thus,wereachapointinwhichweneedtotrainanANNoverseveraliterations,usingahighnumberoffeaturesandinstances.Intheseconditions,traininganANNbecomesacomputationallyintensiveoperation,highlydemandingintermsofprocessing,memoryanddiskusage.Astheamountofdataavailablefortraininggoesaboveaterabyte,itbecomesBigDataproblem.

ThesolutionforBigDataprocessingistodistributethecomputationacrossdifferentmachines,splittingdataamongthemandmergingresultsafterwards.InMap-Reduceapproaches,itispossibletodividethecomputationintoindependentsub-problemsthatcanbecombinedto

produceafinalresult.HadoopandSparkarethemostusedframeworksforBigDataprocessing.

ANNsaredescribedbythefollowingcharacteristics:

layout（thenumberoflayersandneuronsoneachlayer）andtheweightsofconnectionsbetweenneurons（thesecondattributeisdependentonthefirst）.WhentraininganANNforaspecificproblemdataset,theweightsarebeingadjustedtominimizetheoutputerror.BecausethepredictionofANNscanbedescribedasmatrixoperations（wearemultiplyingthesameweightstoaeachrowoffeaturesofprobleminstances）,graphicalprocessingunits（GPUs）areusuallyagoodsolutionforimprovingperformanceandreducetrainingtimes.GPUsweredesignedtoperformmatrixoperationsinthecontextofvideoprocessing,buthave

展开阅读全文