Hadoopsecuritydesign.docx
《Hadoopsecuritydesign.docx》由会员分享,可在线阅读,更多相关《Hadoopsecuritydesign.docx(37页珍藏版)》请在冰豆网上搜索。
![Hadoopsecuritydesign.docx](https://file1.bdocx.com/fileroot1/2023-4/28/57b9558a-735a-4b36-92f3-6172c43299c4/57b9558a-735a-4b36-92f3-6172c43299c41.gif)
Hadoopsecuritydesign
HadoopSecurityDesign
OwenO’Malley,KanZhang,SanjayRadia,
RamMarti,andChristopherHarrell
Yahoo!
{owen,kan,sradia,rmarti,cnh}@yahoo-
October2009
Contents
1Overview2
1.1Securityrisks.............................2
1.2Requirements.............................2
1.3Designconsiderations........................3
2UseCases3
2.1Assumptions.............................3
2.2HighLevelUseCases.........................4
2.3UnsupportedUseCases.......................6
2.4DetailedUseCases..........................6
3RPC8
4HDFS8
4.1DelegationToken...........................10
4.1.1Overview...........................10
4.1.2Design.............................10
4.2BlockAccessToken..........................12
4.2.1Requirements.........................12
4.2.2Design.............................12
5MapReduce14
5.1JobSubmission............................14
5.2Task..................................15
5.2.1JobToken...........................15
5.3Shuffle.................................15
5.4WebUI................................16
6HigherLevelServices16
6.1Oozie.................................16
11OVERVIEW
7TokenSecretsSummary17
7.1DelegationToken...........................17
7.2JobToken...............................17
7.3BlockAccessToken..........................17
8APIandEnvironmentChanges18
1Overview
1.1Securityrisks
Wehaveidentifiedthefollowingsecurityrisks,amongothers,tobeaddressed
first.
1.Hadoopservicesdonotauthenticateusersorotherservices.Asaresult,
Hadoopissubjecttothefollowingsecurityrisks.
(a)AusercanaccessanHDFSorMapReduceclusterasanyotheruser.
Thismakesitimpossibletoenforceaccesscontrolinanuncooperative
environment.Forexample,filepermissioncheckingonHDFScanbe
easilycircumvented.
(b)AnattackercanmasqueradeasHadoopservices.Forexample,user
coderunningonaMapReduceclustercanregisteritselfasanew
TaskTracker.
2.DataNodesdonotenforceanyaccesscontrolonaccessestoitsdatablocks.
Thismakesitpossibleforanunauthorizedclienttoreadadatablockas
longasshecansupplyitsblockID.It’salsopossibleforanyonetowrite
arbitrarydatablockstoDataNodes.
1.2Requirements
1.UsersareonlyallowedtoaccessHDFSfilesthattheyhavepermissionto
access.
2.UsersareonlyallowedtoaccessormodifytheirownMapReducejobs.
3.UsertoservicemutualauthenticationtopreventunauthorizedNameN-
odes,DataNodes,JobTrackers,orTaskTrackers.
4.Servicetoservicemutualauthenticationtopreventunauthorizedservices
fromjoiningacluster’sHDFSorMapReduceservice.
5.TheacquisitionanduseofKerberoscredentialswillbetransparentto
theuserandapplications,providedthattheoperatingsystemacquireda
KerberosTicketGrantingTickets(TGT)fortheuseratlogin.
6.ThedegradationofGridMixperformanceshouldbenomorethan3%.
21.3Designconsiderations2USECASES
1.3Designconsiderations
WechoosetouseKerberosforauthentication(wealsocomplementitwitha
secondmechanismasexplainedlater).AnotherwidelyusedmechanismisSSL.
WechooseKerberosoverSSLforthefollowingreasons.
1.BetterperformanceKerberosusessymmetrickeyoperations,whichare
ordersofmagnitudefasterthanpublickeyoperationsusedbySSL.
2.SimplerusermanagementForexample,revokingausercanbedone
bysimplydeletingtheuserfromthecentrallymanagedKerberosKDC
(keydistributioncenter).WhereasinSSL,anewcertificaterevocation
listhastobegeneratedandpropagatedtoallservers.
2UseCases
2.1Assumptions
1.Forbackwardscompatibilityandsingle-userclusters,itwillbepossibleto
configuretheclusterwiththecurrentstyleofsecurity.
2.Hadoopitselfdoesnotissueusercredentialsorcreateaccountsforusers.
Hadoopdependsonexternalusercredentials(e.g.OSlogin,Kerberoscre-
dentials,etc).UsersareexpectedtoacquirethosecredentialsfromKer-
berosatoperatingsystemlogin.Hadoopservicesshouldalsobeconfigured
withsuitablecredentialsdependingontheclustersetuptoauthenticate
witheachother.
3.Eachclusterissetupandconfiguredindependently.Toaccessmultiple
clusters,aclientneedstoauthenticatetoeachclusterseparately.However,
asinglesignonthatacquiresaKerberosticketwillworkonallappropriate
clusters.
4.Userswillnothaveaccesstorootaccountsontheclusteroronthema-
chinesthatareusedtolaunchjobs.
5.HDFSandMapReducecommunicationwillnottravelonuntrustednet-
works.
6.AHadoopjobwillrunnolongerthan7days(configurable)onaMapRe-
duceclusteroraccessingHDFSfromthejobwillfail.
7.KerberosticketswillnotbestoredinMapReducejobsandwillnotbe
availabletothejob’stasks.AccesstoHDFSwillbeauthorizedviadele-
gationtokensasexplainedinsection4.1.
32.2HighLevelUseCases2USECASES
2.2HighLevelUseCases
1.ApplicationsaccessingfilesonHDFSclustersNon-MapReduceap-
plications,includinghadoopfs,accessfilesstoredononeormoreHDFS
clusters.Theapplicationshouldonlybeabletoaccessfilesandservices
theyareauthorizedtoaccess.Seefigure1.Variations:
(a)AccessHDFSdirectlyusingHDFSprotocol.
(b)AccessHDFSindirectlythoughHDFSproxyserversviatheHFTP
FileSystemorHTTPget.
Name
Node
Data
Node
kerb(joe)
kerb(hdfs)
blocktokenApplication
MapReduce
Task
blocktoken
delg(joe)
Figure1:
HDFSHigh-levelDataflow
2.Applicationsaccessingthird-party(non-Hadoop)servicesNon-
MapReduceapplicationsandMapReducetasksaccessingfilesoropera-
tionssupportedbythirdpartyservices.Anapplicationshouldonlybe
abletoaccessservicestheyareauthorizedtoaccess.Examplesofthird-
partyservices:
(a)AccessNFSfiles
(b)AccessZooKeeper
3.UsersubmittingjobstoMapReduceclustersAusersubmitsjobsto
oneormoreMapReduceclusters.Jobscanonlybesubmittedtoqueues
theuserisauthorizedtouse.Theusercandisconnectafterjobsubmission
andmayre-connecttogetjobstatus.Jobsmayneedtoaccessfilesstored
onHDFSclustersastheuserasdescribedincase1).Theuserneeds
tospecifythelistofHDFSclustersforajobatjobsubmission.Jobs
shouldonlybeabletoaccessonlythoseHDFSfilesorthird-partyservices
authorizedforthesubmittinguser.Seefigure2.Variations:
(a)JobissubmittedviaJobClientprotocol
(b)JobissubmittedviaWebServicesprotocol(Phase2)
42.2HighLevelUseCases2USECASES
Job
Tracker
Task
Tracker
kerb(joe)
kerb(mapreduce)
Task
Other
Service
HDFSHDFSHDFS
NFS
jobtokendelg(joe)
trust
Application
other
credential
Figure2:
MapReduceHigh-levelDataflow
4.UsersubmittingworkflowstoOozieAusersubmitsaworkflowto
Oozie.Theuserisauthenticatedviaapluggablemechanism.Oozieuses
Kerberos-basedRPCtoaccesstheJobTrackerandNameNodebyauthen-
ticatingastheOozieservice.TheJobTrackerandNameNodearecon-
figuredtoallowtheOozieprincipaltoactasasuper-userandworkon
behalfofotherusersasinfigure3.
Oozie
oozie(joe)
Application
HDFSHDFSMap
Reduce
HDFSHDFSHDFS
kerb(oozie)forjoe
kerb(oozie)forjoe
Figure3:
OozieHigh-levelDataflow
5.Headlessaccountdoingusecases1,2,3,and4.Theonlydiffer-
encebetweenaheadlessaccountandotheraccountsisthattheheadless
accountswillhavetheirkeysaccessibleviaakeytabinsteadofusingthe
user’spassword.
52.3UnsupportedUseCases2USECASES
User
Process
Oozie
Job
Tracker
Task
Tracker
Task
Name
Node
Data
Node
NFS
ZooKeeper
Browser
HTTPplugauth
HTTPHMAC
RPCKerberos
RPCDIGEST
BlockAccess
ThirdParty
Figure4:
AuthenticationDataflow
2.3UnsupportedUseCases
Thefollowingusecaseswillnotbesupportedbythefirstsecurityreleaseof
Hadoop.
1.UsingHFTP/HSFTPprotocoltoaccessHDFSclusterswithoutaHDFS
delegationtoken.
2.AccessingtheHadoopservicessecurelyacrosstheuntrustedinternet.
2.4DetailedUseCases
1.Userprincipalsetup.
(a)Userprincipalsaretiedtotheoperatingsystemloginaccount.
(b)Headlessaccountsarecreatedforproductionjobs.
2.ClustersetupAdmincreatesandconfiguresaHadoopclusterinthe
Grid
(a)ConfiguresclusterforKerberosauthentication.
(b)Adminadds/changesspecificusersandgroupstoacluster’sservice
authorizationlist
i.Onlytheseusers/groupswillhaveaccesstotheclusterregardless
ofwhetherfileorjobqueuepermissionallowsaccess.
ii.Adminaddshimselfandagrouptothesuperuserandsuper-
group.
62.4DetailedUseCases2USECASES
(c)AdmincreatesjobqueuesandtheirACLs.
3.UserrunsapplicationwhichaccessesHDFSfilesandthird-party
services.userrunsnon-MapReduceapplications;theseapplicationscan
access
(a)HDFSFilesinthelocalcluster.
(b)HDFSfilesinremoteHDFSclusters.
(c)Thirdpartyservicesusingsensitivethird-partycredentials.Users
shouldbeabletosecurelystorethosethird-partycredentialsonthe
submittingmachines.
i.ZooKeeper
4.Usersubmitsaworkflowjobtoa