1、Bioinformaticshomework“Bioinformatics” homework for undergraduate (2016)#1How many nucleotide sequences from maize (Zea mays) have been stored in the public DNA database (such as GenBank)? How many Waxy (granule-bound starch synthase) gene sequences from maize in the database? 答:2016年11月6日星期日,访问NCBI
2、(网址为https:/www.ncbi.nlm.nih.gov/),在Nucleotide数据库中搜索zea mays,Species选择Plants,Molecule types选择genomic DNA/RNA,最终结果显示被存储在NCBI中的Zea mays的nucleotide sequences数量为446964。在搜索框中输入(zea maysOrganism) AND waxyGene name,然后搜索得到玉米中Waxy基因序列在数据库中的数目为175。具体操作及结果如下图所示:#2A sequence was generated by a suppression subtra
3、ctive hybridization (SSH) experiment. Please find the best hit(s) of the unknown sequence in the public database and predict its potential function.an unknown sequenceCCTCGGAGATCTTCATGGGGGGCAAGAGCACCATCGTGCTgCACAACACCTGCGAGGACTCGCTCCTCGCTGCACCCATCATTCTTGATCTGGTGCTCCTGGCGGAGCTCAGCACCAGGATTCAGCTGAAGGC
4、CGAGGGAGAGGTAAGAGTCTGACGAGATATGTTGCTAGTCTACTCTGTAGTCGAGATATACTTTGGGAGCCAAACTGAAGATTTCGCTGCTCCACTTGCATTTGTGCAGGACAAGTTCCATTCCTTCCATCCGGTTGCCACCATCCTGAGCTACCTCACCAAGGCACCCCTGGTAAGAAACAATTCTCGACTGTTTGCTCTAAATAACCTATAGATAAATAAAGACGATTAACTGACGTGCCACTGAATTCCTCTGTTAACAGGTTCCTCCTGGCACGCCGGTGGTGAACGCCCTGGCGA
5、AGCAAAGGGCGATGCTGGAGAACATCATGAGGGCGTGTGTCGGCCTGGCGCCCGAAAACAACATGATCCTGGAGTACAAGTGAGGAGCGTGGCCCAAGCTCGCGGAGCCGAGAGCGACCGTACGTACGTAGCAAGTGGCGAGGGGCGACGGGAGGGCAGGACGAAGAAGAAGGCGAGATCGGCTGTGGAATTATTTGGCGGCTTGTCTTTAGTTTCCTTTGCGAATCTTTCCCTGGTTAAGTTTACCCCAGTGAGTGTGTGTCCTTGCGAGAAAAG答:进入NCBI做blast,具体网址为http
6、:/blast.ncbi.nlm.nih.gov/Blast.cgi,选择Blastx,将上述序列复制到查询框中,参数选择默认参数,直接Blast,得到最佳联配结果为Inositol-3-phosphate synthase Dichanthelium oligosanthes。进入EMBL做blast,具体网址为http:/www.ebi.ac.uk/Tools/sss/ncbiblast/,选择Blastx,将上述序列复制到查询框中,参数选择默认参数,直接Blast,得到最佳联配结果为Inositol-3-phosphate synthase。根据两处的联配结果可以推测这个未知序列可能的功
7、能与Inositol-3-phosphate synthase相同。#3Use dynamic programming method, the Needleman-Wunsch algorithm, to perform global alignment of the sequences:P1=HEAGAWGHEPP2=EPAWHEAEAGScoring system: BLOSUM50 scoring matrix with gap penalty 8. BLOSUM50 (partial)AEGHPWA5-10-2-1-3E6-30-1-3G8-2-2-3H10-2-3P10-4W15答:
8、具体每一步动态规划的计算过程如下图所示,以黄颜色突出的部分表示达到最优联配所需经过的每一步。P1HEAGAWGHEPP20-8-16-24-32-40-48-56-64-72-80E-80-2-10-18-26-34-42-50-58-66P-16-8-1-3-11-19-27-36-44-51-48A-24-16-94-3-6-14-22-30-38-46W-32-24-17-41-691-7-15-23H-40-22-24-12-6-117113-5E-48-30-16-20-14-7-4-17179A-56-38-24-17-20-9-10-4-1916E-64-46-32-25-20-
9、17-12-12-458A-72-54-40-27-25-15-20-12-12-34G-80-72-48-35-19-23-18-12-14-11-4最终可以得到最佳的联配方式如下所示,其中下划线表示空位P1: HEAGAWGH_EP_P2: _EP_AWHEAEAGScore:-8+6-1-8+5+15-2+0-8+6-1-8= - 4#4Please find genes in a genomic segment of bamboo (Download).答:打开如下网址 ,Organism选择Monocot plants(因为里面没有竹子对应的选项),然后运行在线程序,最后得到结果如下
10、,它给出了可能的基因及它们编码的蛋白质的碱基序列。FGENESH 2.6 Prediction of potential genes in Monocot genomic DNA Time : Sun Nov 6 04:05:56 2016 Seq name: test sequence Length of sequence: 49600 Number of predicted genes 10: in +chain 2, in -chain 8. Number of predicted exons 22: in +chain 10, in -chain 12. Positions of pr
11、edicted genes and exons: Variant 1 from 1, Score:299.051538 G Str Feature Start End Score ORF Len 1 - PolA 6777 0.44 1 - 1 CDSo 6884 - 7057 4.33 6884 - 7057 174 1 - TSS 8311 -1.78 2 + TSS 17591 -4.18 2 + 1 CDSf 17742 - 17811 16.41 17742 - 17810 69 2 + 2 CDSl 19834 - 20792 87.86 19836 - 20792 957 2 +
12、 PolA 21649 0.44 3 + TSS 21801 -7.58 3 + 1 CDSf 22009 - 22085 19.33 22009 - 22083 75 3 + 2 CDSi 22583 - 22652 3.65 22584 - 22652 69 3 + 3 CDSi 23070 - 23145 6.56 23070 - 23144 75 3 + 4 CDSi 23236 - 23353 18.37 23238 - 23351 114 3 + 5 CDSi 24144 - 24233 8.37 24145 - 24231 87 3 + 6 CDSi 24306 - 24381
13、6.47 24307 - 24381 75 3 + 7 CDSi 24523 - 24650 5.26 24523 - 24648 126 3 + 8 CDSl 24731 - 24800 8.36 24732 - 24800 69 3 + PolA 25006 0.44 4 - PolA 26777 -1.06 4 - 1 CDSl 27135 - 28019 59.89 27135 - 28019 885 4 - 2 CDSf 28097 - 28504 40.08 28097 - 28504 408 4 - TSS 28623 -6.38 5 - PolA 30964 0.44 5 -
14、1 CDSl 30993 - 31177 8.88 30993 - 31175 183 5 - 2 CDSi 31212 - 31431 -7.39 31213 - 31431 219 5 - 3 CDSf 31504 - 31548 8.15 31504 - 31548 45 5 - TSS 31608 -1.28 6 - PolA 33364 0.44 6 - 1 CDSo 33766 - 33954 4.42 33766 - 33954 189 6 - TSS 34021 -3.18 7 - PolA 34094 -1.06 7 - 1 CDSo 34700 - 34975 17.27
15、34700 - 34975 276 7 - TSS 35444 -6.08 8 - PolA 35848 0.44 8 - 1 CDSo 36075 - 36458 20.75 36075 - 36458 384 8 - TSS 37019 -5.38 9 - PolA 40341 -1.06 9 - 1 CDSo 40879 - 41067 9.51 40879 - 41067 189 9 - TSS 41777 -5.68 10 - PolA 43349 -1.06 10 - 1 CDSl 44280 - 44545 7.17 44280 - 44543 264 10 - 2 CDSf 4
16、6131 - 46686 33.96 46132 - 46686 555 10 - TSS 46908 -8.18Predicted protein(s):FGENESH:mRNA 1 1 exon (s) 6884 - 7057 174 bp, chain -ATGGGGGTGAATATGAAGGGTAAGCAGCACATGCCGCGGCCATGTGCGTCGGTGGTTCACTGGTTCAGTTTCCACGTCCACGAGTGGCCTCGCACTGTCGATAGCGATCGAATGAACGTTCTTTGCTGCTGCACGGCGGGAGCTTCTCCGGAACAGTCAGGGCTGATTG
17、GTTAGFGENESH: 1 1 exon (s) 6884 - 7057 57 aa, chain -MGVNMKGKQHMPRPCASVVHWFSFHVHEWPRTVDSDRMNVLCCCTAGASPEQSGLIGFGENESH:mRNA 2 2 exon (s) 17742 - 20792 1029 bp, chain +ATGCGCCGGGTAGCGCTGTTGCTGCTGCTCGTCTGCGCGGCGGCGCGCGCCGCCGCGGTCGTCACCGACGGGCTTCTTCCGAACGGCAACTTCGAGGATGGCCCGCCCAAGTCGGCGCTGGTGAACGGCACTGT
18、GGTGTCGGGCGCCAACGCCATCCCTAGCTGGGAGACCTCCGGCTTCGTGGAGTACATCGAGTCGGGGCACAAGCAGGGCGACATGCTCCTGGTGGTGCCCCAGGGCGCCCACGCCGTGCGCCTGGGCAACGAGGCCTCCATCCGGCAGCGCCTCTCCGTCACCCGGGGCGCCTACTACTCCATCACCTTCAGCGCGGCGCGCACCTGCGCGCAGGCCGAGCGCCTCAACGTCTCCGTGTCCCCCGAGTGGGGCGTCCTCCCGATGCAGACCATCTACGGCAGCAACGGGTGGGACTCGTA
19、CGCCTGGGCCTTCAAGGCCAAGCTGGACACGGTGACGCTCGTCCTCCACAACCCCGGCGTCGAGGAGGACCCGGCCTGCGGCCCGCTCATCGACGGCGTCGCCATCCGGGCCCTGTACCCGCCCACGCTGGCCCGCGGCGGCAACATGCTCAAGAACGGCGGCTTCGAGGAGGGGCCCTACTTTTTACCCAACGCGTCGTGGGGCGTGCTCGTGCCGCCCAACATCGAGGACGACCACTCCCCGCTCCCGGCCTGGATGATCGTGTCCTCCAAGGCCGTCAAGTACGTGGACGCCGCGCA
20、CTTTAAGGTCCCCAGGGCGCGGCGCGCCGTGGAGCCTGGTGGCCCCGGGGAGGGAAGCGGCTGGTGCAGGAGGTGGCGCCACCGTGCGGTGGAGCTACCACCCTGGCCTTCGCCGTGGGGGACGCCGCCGACGGGTGCGAGGGGTCGCATGGTGGGGCCGAGGCGTACACCGGCGCGGCCCACCCGTGAAGGTGGGCCGTACGAGTCCCAAGGGGACGGGAACTTCCTTTTTTCTTCTTCACGGCCATCGCCAGCCGCACCCGGGTCGTGTTCCAGAGCACCTTCTACCACATGAFGENE
21、SH: 2 2 exon (s) 17742 - 20792 342 aa, chain +MRRVALLLLLVCAAARAAAVVTDGLLPNGNFEDGPPKSALVNGTVVSGANAIPSWETSGFVEYIESGHKQGDMLLVVPQGAHAVRLGNEASIRQRLSVTRGAYYSITFSAARTCAQAERLNVSVSPEWGVLPMQTIYGSNGWDSYAWAFKAKLDTVTLVLHNPGVEEDPACGPLIDGVAIRALYPPTLARGGNMLKNGGFEEGPYFLPNASWGVLVPPNIEDDHSPLPAWMIVSSKAVKYVDAAHFKVPRARRA
22、VEPGGPGEGSGWCRRWRHRAVELPPWPSPWGTPPTGARGRMVGPRRTPARPTREGGPYESQGDGNFLFSSSRPSPAAPGSCSRAPSTTFGENESH:mRNA 3 8 exon (s) 22009 - 24800 705 bp, chain +ATGCGGCTGCTCCTGCTCCTCCTCGCCGGCGCCGCCGCCCGCGCCTCCGACGACCCCTTCCTCTCCGGCGGACGGCGGCGCTCCCCAATCAGCAGACGGTGGACTACCCCAGCTTCAAGCTCGTCATCGTCGGCGATGGTGGCACAGTCGTCTCTGC
23、ATCTTGTAGGCAAAACCACCTTTGTGAAGAGGCATCTGACTGGTGAGTTTGAGAAGAAGTATGAACCCACCATTGGTGTTGAGGTTCATCCCCTGGACTTCTACACCAACCGCGGGAAGATCCGGTTCTACTGCTGGGACACTGCAGGGCAGGAGAAGTTTGGTGGGCTCAGGGATGGATACTACGTCCATGGACAGTGTGGGATCATTATGTTTGATGTAACCTCACGGCTGAGTTACAAGAATGTTCCAACTTGGCACCGTGATTTATCCAGGGTCTGTGACAACATCCCAATTGTGC
24、TTTGTGGGAACAAGGTCGACGTGAAGAACAGGCAGGTCAAGGCAAAGCAGCAACCTATTTATTGGACGTGGGTAAACCAACCCCTTTTTTGTTGTGACAGTGATGCCAATCTCCACTTTGTTGAAAGCCCTGCTCTCGTTCCTCCAGATGTCACAATTGACATGGTCGCCCAGCAGCAGCATGAAGCTGAGCTGTTAATCGCTGTAGCCCAACCACTGCCTGATGATGACGATGACCTCATCGAGTAGFGENESH: 3 8 exon (s) 22009 - 24800 234 aa, chain +M
25、RLLLLLLAGAAARASDDPFLSGGRRRSPISRRWTTPASSSSSSAMVAQSSLHLVGKTTFVKRHLTGEFEKKYEPTIGVEVHPLDFYTNRGKIRFYCWDTAGQEKFGGLRDGYYVHGQCGIIMFDVTSRLSYKNVPTWHRDLSRVCDNIPIVLCGNKVDVKNRQVKAKQQPIYWTWVNQPLFCCDSDANLHFVESPALVPPDVTIDMVAQQQHEAELLIAVAQPLPDDDDDLIEFGENESH:mRNA 4 2 exon (s) 27135 - 28504 1293 bp, chain -ATGCGGATCAG
26、GAAAGGGAGTCATGTGGAGGTGTGGACGCAGGACGCGGCGTCGCCGGTGGGCGCGTGGCGCGTCGGGGAGGTCACCTGGGGCAACGGCCACTCGTACACCATGCGGTGGCACGACGGCGGCGGCGAGGTCTCCGGCCGCATCTCGAGGAAGTCGGTCCGCCCCCGCCCGCCGCCCGCCCCCGTGCCGCGGGACCTCGACGCCGGGGACATGGTCGAGGTGTTCGACCACGACGACTGCCTCTGGAAGTGCGCCGAGGTCAAGGGCGCCGCCGCCGACGACGACCGCCGCTTCGTCGTCAA
27、GGTCGTCGGCGCCACCAATGTCCTGACGGTCCCGCCGCAGAGGCTCCGCATCCGGCAGGTTCTCAGGGACGACGACGTCTGGGTCGCGCTCCACAAGAGCTCGTTTCCTGACACCTCGCCGTGGTTCTTTGCTTCTCAGGACAACCAGATCGCCGTCCCTAGCGCGACGCCGCCGTTCCACGCCTACGGCGGAGGCGCTGGCATGGGCATCGGCAGAACCAAAGGCGGCCATAAGCCCATGGCGCCAGGCTTCACGCCGCTGCTGCAGAAGAGGAGCCCGCTGCTGCAGAAGAGAAGCTT
28、CGGTATGCTGGGTTCGAGCACAATAACCCCCAATGGCAAGAGATTCGACGACACCGCCAAGAGGATTTGTGCCAAGGAAGAGCCCAGATATGAAGTAGAAGTGGTCGTCCCAAACGTGCGCCTGAACAAGCAAGACGAGATGAGCGGCGAAGATGTTGACGTGCTTGGGACACGCAGTGATTCCGATGATGATCATCATCAGCAGCAGCAGCAGCACGAGGACGAGGATGACGATGACGATAGTGATGATTCTGCATCATCATCCTCGGATGATGACAGCAGCAGTGACAGCAGTAACAG
29、CGACAGCAGAACCAGGAGCACCGGAGCCGGCAAGAATTGCACGGCAGCTCTCGCAAGCAGGCCTTGTAACGATCAGAAGGCCGATCAGCTGCAACCCAGCGAGAAAGAACATCGTGACGACATATCTGAATCGCATCACGAGACCCTGAACGATGAGAAGGCGGCGGTGGTGCAGGAACACATCCACCGTCTGGAGCTGGAGGCCTACACTAATCTGATGAAGGCGTTCCATGCATGTGGCAAAGCGCTGAGCTGGGAGAAGGCCGAACTGCTCACTGACCTCCGCGTGCATCTCCATAT
30、CTCTAACGATGAGCACCTGCGGGTGCTTAACATGATCTTGAACCGCAAGGGCAGATTTGGAGGATCACATGCAAATTCTTAAFGENESH: 4 2 exon (s) 27135 - 28504 430 aa, chain -MRIRKGSHVEVWTQDAASPVGAWRVGEVTWGNGHSYTMRWHDGGGEVSGRISRKSVRPRPPPAPVPRDLDAGDMVEVFDHDDCLWKCAEVKGAAADDDRRFVVKVVGATNVLTVPPQRLRIRQVLRDDDVWVALHKSSFPDTSPWFFASQDNQIAVPSATPPFHAYGGGAGMGIGRTKGGHKPMAPGFTPLLQKRSPLLQKRSFGMLGSSTITPNGKRFDDTAKRICAKEEPRYEVEVVVPNVRLNKQDEMSGEDVDVLGTRSDSDDDHHQQQQQHEDEDDDDDSDDSASSSSDD
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1