RNA-seq数据的处理与分析_精品文档.pdf
《RNA-seq数据的处理与分析_精品文档.pdf》由会员分享,可在线阅读,更多相关《RNA-seq数据的处理与分析_精品文档.pdf(25页珍藏版)》请在冰豆网上搜索。
![RNA-seq数据的处理与分析_精品文档.pdf](https://file1.bdocx.com/fileroot1/2022-10/14/a3ade84b-06ee-4e40-90f3-941c1660bffc/a3ade84b-06ee-4e40-90f3-941c1660bffc1.gif)
RNA-seqDataHandlingandAnalysisKevinChildsStatisticalgenetics/genomicsjournalclub中国测序论坛OverviewFasta/FastqfileformatsNCBISRADatapreparationBowtie/Tophat/CufflinksVelvet/OasesTrinity中国测序论坛FastaFileFormat#FASTAgi|1800214|gb|U56729.1|SBU56729SorghumbicolorphytochromeACGCATCCTTCCGCGCCGGGCATGGGCACCGCGTCGGCGCGCGCCCCTACCCAGTCGTCGACTTGATGCTGCTCACTCGCACTCGTCGCAGCGCCCCACGCCCCGCTATTTATGCGTACTTGCTTGCCGGGAGAGTCGCTGGAGGTGGGCGTCCTCCTCCCGCTCCAGAGCTCGCTGCTTCGCTCCACCCACCCTTAAGCAGGAGTGATATCTGGTGGTTTTTCAAAAGAAGACAAAAATGTCTTCCTCGAGGCCTGCCCACTCTTCCAGTTCATCCAGTAGGACTCGCCAGAGCTCCCAGGCAAGGATATTAGCACAAACAACCCTTGATGCTGAACTCAATGCAGAGTATGAAGAATCTGGTGATTCCTTTGATTACTCCAAGTTGGTTGAAGCACAGCGGAGCACTCCATCTGAGCAGCAAGGGCGATCAGGAAAGGTCATAGCCTACTTGCAGCATATTCAAAGAGGAAAGCTAATCCAACCATTTGGTTGCTTGTTGGCCCTTGACGAGAAGAGCTTCAGGGTCATTGCATTCAGTGAGAATGCACCTGAAATGCTCACAACGGTCAGCCATGCTGTGCCAAACGTTGATGATCCCCCAAAGCTAGGAATTGGTACCAATGTGCGCTCCCTTTTCACTGACCCTGGTGCTACAGCACTGCAGAAGGCACTAGGATTTGCTGATGTTTCTTTGCTGAATCCTATCCTAGTTCAATGCAAGACCTCAGGCAAGCCATTCTATGCCATTGTTCATAGGGCAACTGGTTGTCTGGTGGTTGATTTTGAGCCTGTGAAGCCTACAGAATTTCCTGCCACTGCTGCTGGGGCTTTGCAGTCT中国测序论坛FastqFileFormatReadNameSequenceQualityQualityscoresareinASCIIcharactersrepresentingcodedPhredscores.ASCIIcodesstartatASCII33orASCII64.AllSRAcodesconvertedtoASCII33Thesescoresprovidealikelihoodthatthebasewascalledincorrectly.101in10chancethebasecallisincorrect201in100chancethebasecallisincorrect301in1000chancethebasecallisincorrect中国测序论坛HighThroughputSequencingPlatformsIlluminaHiSeq1000andHiSeq2000IlluminaGenomeAnalyzerIIx*LifeSciences/Roche454pyrosequencingABISolidSequencingSystem*PacificBiosciences*IonTorrentCambridgeNannopore(late2012?
)中国测序论坛HighThroughputSequencingHiSeq2000HighlyparallelsequencingbysynthesisSingleandpaired-endreadsbetween50bpand100bp187millionsingleendor374millionpaired-endreadsperlaneHigherrorrateinthe3end中国测序论坛NCBISRASRAtoolkitfastq-dump/opt/sratoolkit/fastq-dumpSRR373821.lite.sra/opt/sratoolkit/fastq-dump-split-filesSRR329070.lite.sra中国测序论坛ReadQualitywiththeFASTX-Toolkithttp:
/hannonlab.cshl.edu/fastx_toolkit/中国测序论坛ReadQualitywiththeFASTX-ToolkitBadSequenceGoodSequence中国测序论坛FASTXToolkitfastx_quality_stats-Q33iinitial_fastq_file.fastqostats.txtfastx_quality_boxplot_graph.sh-Q33istats.txttTitleoquality.pngfastx_clipper-Q33-v-aAATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-iinitial_fastq_file.fastq-ofastq_file_clipped.fastqfastx_artifacts_filter-Q33-v-ifastq_file_clipped.fastq-ofastq_file_artifact_filtered.fastqfastq_quality_trimmer-Q33-v-t20-l30-ifastq_file_artifact_filtered.fastq-ofastq_file_cleaned.fastq-QisanundocumentedparametertoindicatethatqualityvaluesuseASCII33encoding.中国测序论坛FastQChttp:
/www.bioinformatics.bbsrc.ac.uk/projects/fastqc/Aqualitycontroltoolforhighthroughputsequencedata.中国测序论坛SamtoolsPackageofprogramsformanipulatingsamandbamfilessamsequencealignmentmapbambinaryalignmentmapcompressedformofsamfilehttp:
/中国测序论坛TuxedoSuiteBowtiefastandqualityawareshortreadalignerforaligningDNAandRNAsequencereadsTopHatfast,splicejunctionmapperforRNA-SeqreadsbuiltontheBowtiealignerCufflinksassemblestranscripts,estimatestheirabundances,andtestfordifferentialexpressionandregulationusingthealignmentsfromBowtieandTopHat中国测序论坛BowtieAlignsshortreadstolargegenomesFormsthebasisforTopHat,Cufflinks,Crossbow,andMyrnaUnlessyouareworkingwithgenomicDNAderivedshortreads,youwillnotdirectlyuseBowtieWiththeexceptionofusingbowtie-buildtocreateangenomicsequenceindexfile中国测序论坛TopHatBuiltonBowtieandusesthesamegenomeindexUsedforalignmentofRNA-SeqreadstoagenomeOptimizedforpaired-end,Illuminasequencereads70bp中国测序论坛TopHat中国测序论坛QuantificationofgeneexpressionusingRNA-seqreadsTestsfordifferentialexpressionUsesoutputfrombowtie/tophatAssemblesreadalignmentsintotranscriptsUsescufflinks-predictedtranscriptsoruser-suppliedgenemodelsforquantificationEstimatestranscriptabundancebalancedacrosstranscriptisoformsCufflinks中国测序论坛Cufflinks中国测序论坛Bowtie/Tophat/Cufflinksbowtie-buildpseudomolecule.fapseudomolecule.indextophat-p6-solexa1.3-quals-i5-I1000-r100-no-novel-juncs-GTFpseudomolecule.gtf-o/output/directorypseudomolecule.indexpurified_reads.fastqsamtoolssorttophat_output_pairs.bamtophat_output_pairs_sortedsamtoolsview-otophat_output_pairs_sorted.samtophat_output_pairs_sorted.bamcufflinks-q-o/output/directory/-p4-Gpseudomolecule_corrected.gtftophat_output_pairs_sorted.sam中国测序论坛Velvet/OasesGenome/transcriptomeassemblypackageVelveth/velvetgworkwellforgenomesbutproducefragmentedtranscriptomesassemblies.Itsmodulesexplicitlyassumelinearityanduniformcovera