小程序中英文外文文献翻译.docx
《小程序中英文外文文献翻译.docx》由会员分享,可在线阅读,更多相关《小程序中英文外文文献翻译.docx(20页珍藏版)》请在冰豆网上搜索。
![小程序中英文外文文献翻译.docx](https://file1.bdocx.com/fileroot1/2022-10/4/e92ae9cc-0659-476f-83b0-582db31d4947/e92ae9cc-0659-476f-83b0-582db31d49471.gif)
本科毕业设计(论文)
中英文对照翻译
(此文档为word格式,下载后您可任意修改编辑!
)
外文文献翻译原文及译文
标题:
ENHANCINGAPPLICATIONPERFORMANCEUSINGMINI-APPS:
COMPARISONOFHYBRIDPARALLELPROGRAMMINGPARADIGMS
作者:
GaryLawsonMichaelPoteatMashaSosonkinaRobertBaurle
期刊:
ComputerScience
年份:
2016原文
ENHANCINGAPPLICATIONPERFORMANCEUSINGMINI-APPS:
COMPARISONOFHYBRIDPARALLELPROGRAMMINGPARADIGMS
GaryLawsonMichaelPoteatMashaSosonkinaRobertBaurle
ABSTRACT
Inmanyfields,real-worldapplicationsforHighPerformanceComputinghavealreadybeendeveloped.Fortheseapplicationstostayup-to-date,newparallelstrategiesmustbeexploredtoyieldthebestperformance;however,restructuringormodifyingareal-worldapplicationmaybedauntingdependingonthesizeofthecode.Inthiscase,amini-appmaybeemployedtoquicklyexploresuchoptionswithoutmodifyingtheentirecode.Inthiswork,severalmini-appshavebeencreatedtoenhanceareal-worldapplicationperformance,namelytheVULCANcodeforcomplexflowanalysisdevelopedattheNASALangleyResearchCenter.Thesemini-appsexplorehybridparallelprogrammingparadigmswithMessagePassingInterface(MPI)fordistributedmemoryaccessandeitherSharedMPI(SMPI)orOpenMPforsharedmemoryaccesses.PerformancetestingshowsthatMPI+SMPIyieldsthebestexecutionperformance,whilerequiringthelargestnumberofcodechanges.Amaximumspeedupof
23wasmeasuredforMPI+SMPI,butonly10wasmeasuredforMPI+OpenMP.Keywords:
Mini-apps,Performance,VULCAN,Shared
Memory,MPI,OpenMP1INTRODUCTION
Inmanyfields,real-worldapplicationshavealreadybeendeveloped.Forestablishedapplicationstostayup-to-date,newparallelstrategiesmustbeexploredtodeterminewhichmayyieldthebestperformance,especiallywithadvancesincomputinghardware.However,restructuringormodifyingareal-worldapplicationincursincreasedcostdependingonthesizeofthecodeandchangestobemade.Amini-appmaybecreatedtoquicklyexploresuchoptionswithoutmodifyingtheentirecode.Mini-appsreducetheoverheadofapplyingnewstrategies,thusvariousstrategiesmaybeimplementedandcompared.Thisworkpresentstheauthorsexperienceswhenfollowingthisstrategyforareal-worldapplicationdevelopedbyNASA.
VULCAN(ViscousUpwindAlgorithmforComplexFlowAnalysis)isaturbulent,noequilibrium,finite-ratechemicalkinetics,Navier-Stokesflowsolverforstructured,cell-centered,multiblockgridsthatismaintainedanddistributedbytheHypersonicAirBreathingPropulsionBranchoftheNASALangleyResearchCenter(NASA2016).Themini-appdevelopedinthisworkusestheHouseholderReflectorkernelforsolvingsystemsoflinearequations.Thiskernelisusedoftenbydifferentworkloads,andisagoodcandidatetodecidewhatstrategytypetoapply
toVULCAN.VULCANisbuiltonasingle-layerofMPIandthecodehasbeenoptimizedtoobtainperfectvectorization,thereforetwo-levelsofparallelismarecurrentlyused.Thisworkinvestigatestwoflavorsofshared-memoryparallelism,OpenMPandSharedMPI,whichwillprovidethethird-levelofparallelismfortheapplication.Athird-levelofparallelismincreasesperformance,whichdecreasesthetime-to-solution.
MPIhasextendedthestandardtoMPIversion3.0,whichincludestheSharedMemory(SHM)model(MikhailB.(Intel)2015,MessagePassingInterfaceForum2012),knowninthisworkasSharedMPI(SMPI).ThisextensionallowsMPItocreatememorywindowsthataresharedbetweenMPItasksonthesamephysicalnode.Inthisway,MPItasksareequivalenttothreads,exceptSharedMPIismoredifficultforaprogrammertoimplement.OpenMPisthemostcommonshared-memorylibraryusedtodatebecauseofitsease-of-use(OpenMP2016).Inmostcases,onlyafewOpenMPpragmasarerequiredtoparallelizealoop;however,OpenMPissubjecttoincreasedoverhead,whichmaydecreaseperformanceifnotproperlytuned.
Asearlyastheyear2000,theauthorsin(CappelloandEtiemble2000)foundthatlatencysensitivecodesseemtobenefitfrompureMPIimplementationswhereasbandwidthsensitivecodesbenefitfromhybridMPI+OpenMP.Also,theauthorsfoundthatfasterprocessorswillbenefithybridMPI+OpenMPcodesifdatamovementisnotanoverwhelming
bottleneck(CappelloandEtiemble2000).Sincethistime,hybridMPI+OpenMPimplementationshaveimproved,butnotwithoutdifficulties.In(Drosi-nosandKoziris2004,ChorleyandWalker2010),itwasfoundthatOpenMPincursmanyperformancereductions,including:
overhead(fork/join,atomics,etc),falsesharing,imbalancedmessagepassing,andasensitivitytoprocessormapping.However,OpenMPoverheadmaybehiddenwhenusingmorethreads.In(Rabenseifner,Hager,andJost2009),theauthorsfoundthatsimplyusingOpenMPcouldincurper-formancepenaltiesbecausethecompileravo