全文搜索引擎的设计与实现外文翻译Word下载.doc

上传人:b****1 文档编号:13164700 上传时间:2022-10-07 格式:DOC 页数:25 大小:128.50KB
下载 相关 举报
全文搜索引擎的设计与实现外文翻译Word下载.doc_第1页
第1页 / 共25页
全文搜索引擎的设计与实现外文翻译Word下载.doc_第2页
第2页 / 共25页
全文搜索引擎的设计与实现外文翻译Word下载.doc_第3页
第3页 / 共25页
全文搜索引擎的设计与实现外文翻译Word下载.doc_第4页
第4页 / 共25页
全文搜索引擎的设计与实现外文翻译Word下载.doc_第5页
第5页 / 共25页
点击查看更多>>
下载资源
资源描述

全文搜索引擎的设计与实现外文翻译Word下载.doc

《全文搜索引擎的设计与实现外文翻译Word下载.doc》由会员分享,可在线阅读,更多相关《全文搜索引擎的设计与实现外文翻译Word下载.doc(25页珍藏版)》请在冰豆网上搜索。

全文搜索引擎的设计与实现外文翻译Word下载.doc

ArchitectureandDesign

中文译文Hadoop分布式文件系统:

架构和设计

姓名XXXX

学号200708202137

2013年4月8日

英文原文

TheHadoopDistributedFileSystem:

ArchitectureandDesign

Source:

http:

//hadoop.apache.org/docs/r0.18.3/hdfs_design.html

Introduction

TheHadoopDistributedFileSystem(HDFS)isadistributedfilesystemdesignedtorunoncommodityhardware.Ithasmanysimilaritieswithexistingdistributedfilesystems.However,thedifferencesfromotherdistributedfilesystemsaresignificant.HDFSishighlyfault-tolerantandisdesignedtobedeployedonlow-costhardware.HDFSprovideshighthroughputaccesstoapplicationdataandissuitableforapplicationsthathavelargedatasets.HDFSrelaxesafewPOSIXrequirementstoenablestreamingaccesstofilesystemdata.HDFSwasoriginallybuiltasinfrastructurefortheApacheNutchwebsearchengineproject.HDFSispartoftheApacheHadoopCoreproject.TheprojectURLishttp:

//hadoop.apache.org/core/.

AssumptionsandGoals

HardwareFailure

Hardwarefailureisthenormratherthantheexception.AnHDFSinstancemayconsistofhundredsorthousandsofservermachines,eachstoringpartofthefilesystem’sdata.Thefactthatthereareahugenumberofcomponentsandthateachcomponenthasanon-trivialprobabilityoffailuremeansthatsomecomponentofHDFSisalwaysnon-functional.Therefore,detectionoffaultsandquick,automaticrecoveryfromthemisacorearchitecturalgoalofHDFS.

StreamingDataAccess

ApplicationsthatrunonHDFSneedstreamingaccesstotheirdatasets.Theyarenotgeneralpurposeapplicationsthattypicallyrunongeneralpurposefilesystems.HDFSisdesignedmoreforbatchprocessingratherthaninteractiveusebyusers.Theemphasisisonhighthroughputofdataaccessratherthanlowlatencyofdataaccess.POSIXimposesmanyhardrequirementsthatarenotneededforapplicationsthataretargetedforHDFS.POSIXsemanticsinafewkeyareashasbeentradedtoincreasedatathroughputrates.

LargeDataSets

ApplicationsthatrunonHDFShavelargedatasets.AtypicalfileinHDFSisgigabytestoterabytesinsize.Thus,HDFSistunedtosupportlargefiles.Itshouldprovidehighaggregatedatabandwidthandscaletohundredsofnodesinasinglecluster.Itshouldsupporttensofmillionsoffilesinasingleinstance.

SimpleCoherencyModel

HDFSapplicationsneedawrite-once-read-manyaccessmodelforfiles.Afileoncecreated,written,andclosedneednotbechanged.Thisassumptionsimplifiesdatacoherencyissuesandenableshighthroughputdataaccess.AMap/Reduceapplicationorawebcrawlerapplicationfitsperfectlywiththismodel.Thereisaplantosupportappending-writestofilesinthefuture.

“MovingComputationisCheaperthanMovingData”

Acomputationrequestedbyanapplicationismuchmoreefficientifitisexecutednearthedataitoperateson.Thisisespeciallytruewhenthesizeofthedatasetishuge.Thisminimizesnetworkcongestionandincreasestheoverallthroughputofthesystem.Theassumptionisthatitisoftenbettertomigratethecomputationclosertowherethedataislocatedratherthanmovingthedatatowheretheapplicationisrunning.HDFSprovidesinterfacesforapplicationstomovethemselvesclosertowherethedataislocated.

PortabilityAcrossHeterogeneousHardwareandSoftwarePlatforms

HDFShasbeendesignedtobeeasilyportablefromoneplatformtoanother.ThisfacilitateswidespreadadoptionofHDFSasaplatformofchoiceforalargesetofapplications.

NameNodeandDataNodes

HDFShasamaster/slavearchitecture.AnHDFSclusterconsistsofasingleNameNode,amasterserverthatmanagesthefilesystemnamespaceandregulatesaccesstofilesbyclients.Inaddition,thereareanumberofDataNodes,usuallyonepernodeinthecluster,whichmanagestorageattachedtothenodesthattheyrunon.HDFSexposesafilesystemnamespaceandallowsuserdatatobestoredinfiles.Internally,afileissplitintooneormoreblocksandtheseblocksarestoredinasetofDataNodes.TheNameNodeexecutesfilesystemnamespaceoperationslikeopening,closing,andrenamingfilesanddirectories.ItalsodeterminesthemappingofblockstoDataNodes.TheDataNodesareresponsibleforservingreadandwriterequestsfromthefilesystem’sclients.TheDataNodesalsoperformblockcreation,deletion,andreplicationuponinstructionfromtheNameNode.

TheNameNodeandDataNodearepiecesofsoftwaredesignedtorunoncommoditymachines.ThesemachinestypicallyrunaGNU/Linuxoperatingsystem(OS).HDFSisbuiltusingtheJavalanguage;

anymachinethatsupportsJavacanruntheNameNodeortheDataNodesoftware.UsageofthehighlyportableJ

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > PPT模板 > 其它模板

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1