Hadoop云计算外文翻译文献文档格式.docx
《Hadoop云计算外文翻译文献文档格式.docx》由会员分享,可在线阅读,更多相关《Hadoop云计算外文翻译文献文档格式.docx(18页珍藏版)》请在冰豆网上搜索。
原文:
MeetHadoop
Inpioneerdaystheyusedoxenforheavypulling,andwhenoneoxcouldn’tbudgealog,theydidn’ttrytogrowalargerox.Weshouldn’tbetryingforbiggercomputers,bumtfoorresystemsofcomputers.
-18-
Data!
—GraceHopper
Weliveinthedataage.It’snoteasytomeasurethetotalvolumeofdatastoredelectronically,butanIDCestimateputthesizeofthe“digitaluniverse”at0.18zettabyintes
2006,andisforecastingatenfoldgrowthby2011to1.8zettabytes.Azettabyteis1021bytes,orequivalentlyonethousandexabytes,onemillionpetabytes,oronebillitoernabytes.That’s
roughlythesameorderofmagnitudeasonediskdriveforeverypersonintheworld.
Thisfloodofdataiscomingfrommanysources.Considerthefollowing:
•TheNewYorkStockExchangegeneratesaboutoneterabyteofnewtradedataperday.
•Facebookhostsapproximately10billionphotos,takinguponepetabyteofstorage.
•A,thegenealogysite,storaersound2.5petabytesofdata.
•TheInternetArchivestoresaround2petabytesofdata,andisgrowingatarateof20terabytespermonth.
•TheLargeHadronCollidernearGeneva,Switzerland,willproduceaboupte1t5abytesof
dataperyear.
Sothere’salotofdataoutthere.Butyouareprobablywonderinghowitaffectsyou.Mostofthedataislockedupinthelargestwebproperties(likesearchengines),orscientificorfinancialinstitutions,isn’tit?
Doestheadventof“BigData,”asitisbeing
called,affectsmallerorganizationsorindividuals?
Iarguethatitdoes.Takephotos,forexample.Mywife’sgrandfatherwasanavidphotographer,andtookphotographsthroughouthisadultlife.Hisentirecorpusofmediumformat,slide,and35mmfilm,whenscannedinathigh-resolution,occupiesaround10gigabytes.Comparethistothedigitalphotosthatmyfamilytooklastyear,whichtakeupabout5gigabytesofspace.Myfamilyisproducingphotographicdaatat35timestheratemywife’sgrandfathesr’
did,andtherateisincreasingeveryyearasitbecomeseasiertotakemoreandmorephotos.Moregenerally,thedigitalstreamsthatindividualsareproducingaregrowingapace.
MicrosoftResearch’sMyLifeBitsprojectgivesaglimpseofarchivingorfspoenalinformationthatmaybecomecommonplaceinthenearfuture.MyLifeBitswasanexperimentwhereanindividual’sinteractions—phonecalls,emails,documentswerecapturedelectronicallyand
storedforlateraccess.Thedatagatheredincludedaphototakeneveryminute,whichresultedinanoveralldatavolumeofonegigabyteamonth.Whenstoragecostscomedownenoughtomakeitfeasibletostorecontinuousaudioandvideo,thedatavolumeforafutureMyLifeBitsservicewillbemanytimesthat.
Thetrendisforeveryindividual’sdatafootprinttogrow,butperhapsmoreimportatnhtelyamountofdatageneratedbymachineswillbeevengreaterthanthatgeneratedbypeople.
Machinelogs,RFIDreaders,sensornetworks,vehicleGPStraces,retailtransact—ionasllofthesecontributetothegrowingmountainofdata.
Thevolumeofdatabeingmadepubliclyavailableincreaseseveryyeartoo.Organizationsnolongerhavetomerelymanagetheirowndata:
successinthefuturewillbedictatedtoalargeextentbytheirabilitytoextractvaluefromotherorganizations’data.InitiativessuchasPublicDataSetsonAmazonWebServices,Infochimps.org,andtheinfo.orgexisttofosterthe“informationcommons,”wheredatacanbefreely(orthinecaseofAWS,foramodestprice)sharedforanyonetodownloadandanalyze.Mashupsbetweendifferentinformationsourcesmakeforunexpectedandhithertounimaginableapplications.
Take,forexample,theAproject,whichwatchestheAstrometrygroup
onFlickrfornewphotosofthenightsky.Itanalyzeseachimage,andidentifieswhichpartoftheskyitisfrom,andanyinterestingcelestialbodies,suchasstarsorgalaxAielst.houghit’s
stillanewandexperimentalservice,itshowsthekindofthingsthatarepossiblewhendata(inthiscase,taggedphotographicimages)ismadeavailableandusedforsomething(imageanalysis)thatwasnotanticipatedbythecreator.
Ithasbeensaidthat“Moredatausuallybeatsbetteralgorithms,”whichistosayftohrat
someproblems(suchasrecommendingmoviesormusicbasedonpastpreferences),howeverfiendishyouralgorithmsare,theycanoftenbebeatensimplybyhavingmoredata(andalesssophisticatedalgorithm).
ThegoodnewsisthatBigDataishere.Thebadnewsisthatwearestrugglingtostoreandanalyzeit.
DataStorageandAnalysis
Theproblemissimple:
whilethestoragecapacitiesofharddriveshaveincreasedmassivelyovertheyears,accessspeeds--therateatwhichdatacanbereadfromdrives--havenotkeptup.Onetypicaldrive