高通量测序文献参考Word文件下载.docx
《高通量测序文献参考Word文件下载.docx》由会员分享,可在线阅读,更多相关《高通量测序文献参考Word文件下载.docx(20页珍藏版)》请在冰豆网上搜索。
SupplementaryMethods
DNAextraction,PCRand454pyrosequencing
ThegenomicDNAwasextractedfromeachtailingssubsamplewithamodifiedindirectDNAextractionprotocolasdescribedpreviously(Tanetal.,2008).Briefly,cellswererecoveredfromabout20gtailingsbycentrifugationat900×
gat4º
Cfor10min,using20mLsodiumpyrophosphate(pH3.0orpH7.0)asdispersalreagent(Duarteetal.,1998),thenthesupernatantwascollected.Thisrecoverystepwasrepeatedtwice.Thecollectedsupernatantwascentrifugatedat10,000×
Cfor15mintopelletthecells,thenthesupernatantwasremoved.Thecellpelletsobtainedweretreatedwith20mLof0.3Mammoniumoxalate(pH3.0orpH7.0)for20mintodissolvemostoftheironprecipitate(McKeagueandDay,1966),followedbycentrifugationat10,000×
Ctopelletthecells,thesupernatantwasremovedandthisstepwasrepeateduntilthesupernatantturnedcolorless.DNAfromthecellpelletswasextractedwithaFastDNAKitforsoil(QbiogeneInc.,Carlsbad,CA)followingthemanufacturer’sinstructions.Theuniversalprimerset515F/806R(Batesetal.,2010)wasusedtoamplifythebacterialandarchaeal16SrRNAgenessimultaneously,withan8-bpbarcodespecifictotailingssubsampleontheprimer806R.Theprimersequenceswereasfollows:
(i)CGTATCGCCTCCCTCGCGCCATCAGCAGTGCCAGCMGCCGCGGTAA,theunderlinedsequenceistheLinkPrimerSequence,the‘CA’inblueisthetwo-baseprotectingsequenceontheforwardprimersequence,thesequenceingreenistheprimer515F;
(ii)CTATGCGCCTTGCCAGCCCGCTCAGAACGAACGTCGGACTACVSGGGTATCTAAT,theunderlinedsequenceistheLinkPrimerSequence,the8-bpsequenceinredisthebarcodesequencespecifictotailingssubsample(seeTableS2forallthebarcodes),the‘TC’inblueisthetwo-baseprotectingsequenceonthereverseprimersequence,thesequenceingreenistheprimer806R.PCRreactions(30µ
L)contained0.75unitsExTaqDNApolymerase(TaKaRa,Dalian,China),1×
ExTaqloadingbuffer(TaKaRa,Dalian,China),0.2mMdNTPmix(TaKaRa,Dalian,China),0.2µ
Mofeachprimerandabout100ngtemplateDNA.PCRamplificationwasconductedaccordingtotheprocedureasfollows:
initialdenaturationat95º
Cfor3min;
35cyclesofdenaturationat94º
Cfor30s,primerannealingat50º
Cfor1min,extensionat72º
Cfor1min;
afinalextensionof10minat72º
C.Foreachtailingssubsample,thePCRreactionwasconductedintriplicateandtheproductswerepooledtomitigatePCRamplificationbiases.ThecompositesampleforpyrosequencingwascreatedbycombiningequimolarratiosofamplificationproductsfromindividualsubsamplesasdescribedbyFiereretal.(2008),followedbygelpurificationusingQIAquickGelExtractionKit(Qiagen,Chatsworth,CA).ThepurifiedcompositeDNAsamplewassenttoMacrogenInc.(Seoul,Korea)forpyrosequencingona454GSFLXTitaniumpyrosequencer(Roche454LifeSciences,Branford,CT,USA).
Processingof454pyrosequencingdata
Pyrosequencingdataanalysiswasperformedwithversion1.26ofthemothursoftwarepackage(Schlossetal.,2009)asdescribedbySchlossetal.(2011).Giventheinflationofbiodiversityestimateofsequencesfrom454pyrosequencing(Kuninetal.,2010),thesequencesweredenoisedusingthecommandsof‘shhh.flows’(translationofPyroNoisealgorithm;
Quinceetal.,2009)and‘pre.cluster’(Huseetal.,2010).Additionally,thechimericsequenceswereidentifiedandremovedusingChimericUchime(Edgaretal.,2011).Wealsoremovedthesequenceswith:
(i)asequencelength<
280bp;
and/or(ii)eightormorehomopolymers;
and/or(iii)oneormoreambiguousbases.TheOTUswereidentifiedatthesequenceidentitylevelof97%usingthe‘cluster’commandwiththeaverageclusteringalgorithm(Huseetal.,2010).Subsequently,arepresentativesequencewasselectedfromeachOTUandthetaxonomicassignmentwasachievedusingtheRibosomalDatabaseProject(RDP)Classifier(Wangetal.,2007)withaminimumconfidenceof80%.Thealphamicrobialbiodiversityofthe18tailingssubsampleswasestimatedbytheabundance-basedindicesofChao1,ShannonandSimpson.5,000qualitysequenceswererandomlysampled(iterations,10)fromeachofthe18tailingssubsample,andtheaveragevalueofeachtailingssamplewascalculatedbasedonthevaluesofcorrespondingthreetailingssubsamples.
Metagenomicssequencingandanalysis
Libraryconstructionandrandomshotgunsequencing.ForT2andT6tailingssamples,genomicDNAextractedfromthethreesubsamplesofeachsamplewerepooledandpurifiedwithgelelectrophoresis.ThepurifiedDNAsampleswerethensenttoBGIInc.(Shenzhen,China)forshotgunlibraryconstructionandIlluminasequencing.Forbothsamples,wholegenomeshotgunsequencinglibrarieswithinsertsizeof180bpweregenerated,thenwerepaired-endsequenced(90bp×
2)byIllumina’sHiSeq(2000)platform.
Artifactfilteringandqualitycontrol.TherawIlluminasequencedata(2GBforeachmetagenome)werepassedseveralfilteringandcontrolstepstoobtaincleansequencedataasfollows:
(i)thereadswithadaptercontaminationwereidentifiedandremoved;
(ii)theduplicateswereidentifiedandremoved;
(iii)forthenon-duplicatereads,thereadscontainmorethan18Nwereidentifiedandremoved;
and(iv)theretainedreadsweretrimatthe3’endtoremovethebaseswithaqualityscoreof<
20,andthereadswithover20%oflow-quality(qualityscore<
20)baseswerealsoremoved.Theobtainedcleanreadswereusedforfurtheranalysis.
Wholemetagenomeassembly.Thecleanreadsweredenovoassembledusingvelvet(version1.1.04)(ZerbinoandBirney,2008),usingoptionsins_length=180,exp_cov=auto.Wetriedtoassemblybothmetagenomesusingoptionskfrom21to55,thenthebestassemblyresultswereselectedbasedonthelengthofN50contigandlongestcontig.Asaresult,thebestk-mervalueforT2metagenomewas45(N50contig:
522bp;
longestcontig:
60233bp),andthatvalueforT6metagenomewas51(N50contig:
955bp;
40620bp).
Microbialcommunitycompositionanalysis.TwostrategieswereemployedtorevealthemicrobialcompositionofT2andT6metagenomes:
(i)The16SrRNAgeneswereidentifiedusingBLASTnagainsttheRDPdatabase(release10)(Coleetal.,2009)fromallthecontigs(e-valuethreshold=10-5),andthetaxonomicassignmentoftheidentified16SrRNAwiththeanchors≥100bpwasachievedusingtheRDPClassifierwithaminimumconfidenceof80%;
and(ii)thecontigs(≥300bp)werecomparedagainsttheNationalCenterforBiotechnologyInformation(NCBI)non-redundant(nr)database(e-valuethreshold=10-5),thenthecontigswereclassifiedintotaxonomicgroupswiththelowestancestoralgorithminMEGAN(Husonetal.,2011)withdefaultparameters(minimumscore,35;
minimumsupport,1;
toppercent,10%).
Genepredictionandfunctionalannotation.ThecontigshadreliableNCBI-nrhits,asindicatedbyMEGAN,wereextractedforfurtheranalysis.TheobtainedcontigsweresubjecttogenepredictionusingGenemarkwithdefaultparameters(Zhuetal.,2010),whichyielded51981and49538putativeprotein-codinggenesforT2andT6metagenome,respectively(TableS5).Wethencomparedtheseputativeprotein-codinggenesagainsttheNCBI-nrdatabase,andtheoneswithNCBI-nrhitswerefurthercomparedagainsttheKyotoEncyclopediaofGenesandGenomes(KEGG)database,andtheClustersofOrthologousGroupsofproteins(COG)database,usingBLASTx(e-valuethreshold=10-5).
Genomebinning.BasedonthecontigsblastingresultsandMEGANanalysis(minimumscore,35;
toppercent,10%),thedominatinggenusinT2andT6metagenomeswerebinned.Asaresult,theinformationofthelargestbinsisshowninTableS6.
Contigscoverageestimate.Forthecoverageestimateofcontigs,wefirstlyalignedthecleanreadsusedforassemblytothecontigsusingSOAPAligner(Lietal.,2009),threestepswerethenconducted:
(i)theindexwerebuiltusingallthecontigsfromassemblyresults(2bwt-builder);
(ii)aligncleanreadsagainstthecontigsbasedindex(soap);
and(iii)theSOAP.COVERAGE(Lietal.,2009)wasusedtoparsetheoutputfileofSOAPAligner.ThecoverageestimateofcontigsisshowninFig.S7.
ThefunctionalabundanceprofileanalysisofCOGcataloguesandCOGcategories
BasedontheCOGblastresults,thepredictedgeneswithreliableCOGblasthitswereassignedtoCOGcataloguesandCOGcategories(ifavailable).TodeterminewhetheraspecificCOGcatalogueorCOGcategorywasenrichedinourmetagenomes,theoddsratioforaspecificCOGcatalogueorCOGcategoryagainstthatinallsequencedbacteriaandarchaeawascalculatedasfollows.
Where:
A=No.ofgenesassignedtoaspecificCOGcatalogue(orCOGcategory)inmetagenomeT2(orT6)
B=No.ofgenesassignedtoallotherCOGcatalogues(orCOGcategories)inmetagenomeT2(orT6)
C=No.ofgenesassignedtoaspecificCOGcatalogue(orCOGcategory)inallsequencedbacteriaandarchaea
D=No.ofgenesassignedtoallotherCOGcatalogues(orCOGcategories)inallsequencedbacteriaandarchaea
Thevaluesfor‘C’and‘D’wereobtainedfromtheIntegratedMicrobialGenomes(IMG)system(http:
//img.jgi.doe.gov/cgi-bin/w/main.cgi;
Markowitzetal.,2012).TheP-valuew