全文搜索引擎的设计与实现外文翻译Word下载.doc
《全文搜索引擎的设计与实现外文翻译Word下载.doc》由会员分享,可在线阅读,更多相关《全文搜索引擎的设计与实现外文翻译Word下载.doc(25页珍藏版)》请在冰豆网上搜索。
ArchitectureandDesign
中文译文Hadoop分布式文件系统:
架构和设计
姓名XXXX
学号200708202137
2013年4月8日
英文原文
TheHadoopDistributedFileSystem:
ArchitectureandDesign
Source:
http:
//hadoop.apache.org/docs/r0.18.3/hdfs_design.html
Introduction
TheHadoopDistributedFileSystem(HDFS)isadistributedfilesystemdesignedtorunoncommodityhardware.Ithasmanysimilaritieswithexistingdistributedfilesystems.However,thedifferencesfromotherdistributedfilesystemsaresignificant.HDFSishighlyfault-tolerantandisdesignedtobedeployedonlow-costhardware.HDFSprovideshighthroughputaccesstoapplicationdataandissuitableforapplicationsthathavelargedatasets.HDFSrelaxesafewPOSIXrequirementstoenablestreamingaccesstofilesystemdata.HDFSwasoriginallybuiltasinfrastructurefortheApacheNutchwebsearchengineproject.HDFSispartoftheApacheHadoopCoreproject.TheprojectURLishttp:
//hadoop.apache.org/core/.
AssumptionsandGoals
HardwareFailure
Hardwarefailureisthenormratherthantheexception.AnHDFSinstancemayconsistofhundredsorthousandsofservermachines,eachstoringpartofthefilesystem’sdata.Thefactthatthereareahugenumberofcomponentsandthateachcomponenthasanon-trivialprobabilityoffailuremeansthatsomecomponentofHDFSisalwaysnon-functional.Therefore,detectionoffaultsandquick,automaticrecoveryfromthemisacorearchitecturalgoalofHDFS.
StreamingDataAccess
ApplicationsthatrunonHDFSneedstreamingaccesstotheirdatasets.Theyarenotgeneralpurposeapplicationsthattypicallyrunongeneralpurposefilesystems.HDFSisdesignedmoreforbatchprocessingratherthaninteractiveusebyusers.Theemphasisisonhighthroughputofdataaccessratherthanlowlatencyofdataaccess.POSIXimposesmanyhardrequirementsthatarenotneededforapplicationsthataretargetedforHDFS.POSIXsemanticsinafewkeyareashasbeentradedtoincreasedatathroughputrates.
LargeDataSets
ApplicationsthatrunonHDFShavelargedatasets.AtypicalfileinHDFSisgigabytestoterabytesinsize.Thus,HDFSistunedtosupportlargefiles.Itshouldprovidehighaggregatedatabandwidthandscaletohundredsofnodesinasinglecluster.Itshouldsupporttensofmillionsoffilesinasingleinstance.
SimpleCoherencyModel
HDFSapplicationsneedawrite-once-read-manyaccessmodelforfiles.Afileoncecreated,written,andclosedneednotbechanged.Thisassumptionsimplifiesdatacoherencyissuesandenableshighthroughputdataaccess.AMap/Reduceapplicationorawebcrawlerapplicationfitsperfectlywiththismodel.Thereisaplantosupportappending-writestofilesinthefuture.
“MovingComputationisCheaperthanMovingData”
Acomputationrequestedbyanapplicationismuchmoreefficientifitisexecutednearthedataitoperateson.Thisisespeciallytruewhenthesizeofthedatasetishuge.Thisminimizesnetworkcongestionandincreasestheoverallthroughputofthesystem.Theassumptionisthatitisoftenbettertomigratethecomputationclosertowherethedataislocatedratherthanmovingthedatatowheretheapplicationisrunning.HDFSprovidesinterfacesforapplicationstomovethemselvesclosertowherethedataislocated.
PortabilityAcrossHeterogeneousHardwareandSoftwarePlatforms
HDFShasbeendesignedtobeeasilyportablefromoneplatformtoanother.ThisfacilitateswidespreadadoptionofHDFSasaplatformofchoiceforalargesetofapplications.
NameNodeandDataNodes
HDFShasamaster/slavearchitecture.AnHDFSclusterconsistsofasingleNameNode,amasterserverthatmanagesthefilesystemnamespaceandregulatesaccesstofilesbyclients.Inaddition,thereareanumberofDataNodes,usuallyonepernodeinthecluster,whichmanagestorageattachedtothenodesthattheyrunon.HDFSexposesafilesystemnamespaceandallowsuserdatatobestoredinfiles.Internally,afileissplitintooneormoreblocksandtheseblocksarestoredinasetofDataNodes.TheNameNodeexecutesfilesystemnamespaceoperationslikeopening,closing,andrenamingfilesanddirectories.ItalsodeterminesthemappingofblockstoDataNodes.TheDataNodesareresponsibleforservingreadandwriterequestsfromthefilesystem’sclients.TheDataNodesalsoperformblockcreation,deletion,andreplicationuponinstructionfromtheNameNode.
TheNameNodeandDataNodearepiecesofsoftwaredesignedtorunoncommoditymachines.ThesemachinestypicallyrunaGNU/Linuxoperatingsystem(OS).HDFSisbuiltusingtheJavalanguage;
anymachinethatsupportsJavacanruntheNameNodeortheDataNodesoftware.UsageofthehighlyportableJ