Bioinformaticshomework.docx

上传人:b****7 文档编号:10269886 上传时间:2023-02-09 格式:DOCX 页数:50 大小:2.25MB
下载 相关 举报
Bioinformaticshomework.docx_第1页
第1页 / 共50页
Bioinformaticshomework.docx_第2页
第2页 / 共50页
Bioinformaticshomework.docx_第3页
第3页 / 共50页
Bioinformaticshomework.docx_第4页
第4页 / 共50页
Bioinformaticshomework.docx_第5页
第5页 / 共50页
点击查看更多>>
下载资源
资源描述

Bioinformaticshomework.docx

《Bioinformaticshomework.docx》由会员分享,可在线阅读,更多相关《Bioinformaticshomework.docx(50页珍藏版)》请在冰豆网上搜索。

Bioinformaticshomework.docx

Bioinformaticshomework

“Bioinformatics”homeworkforundergraduate(2016)

#1

Howmanynucleotidesequencesfrommaize(Zeamays)havebeenstoredinthepublicDNAdatabase(suchasGenBank)?

HowmanyWaxy(granule-boundstarchsynthase)genesequencesfrommaizeinthedatabase?

答:

2016年11月6日星期日,访问NCBI(网址为https:

//www.ncbi.nlm.nih.gov/),在Nucleotide数据库中搜索zeamays,Species选择Plants,Moleculetypes选择genomicDNA/RNA,最终结果显示被存储在NCBI中的Zeamays的nucleotidesequences数量为446964。

在搜索框中输入(zeamays[Organism])ANDwaxy[Genename],然后搜索得到玉米中Waxy基因序列在数据库中的数目为175。

具体操作及结果如下图所示:

#2

Asequencewasgeneratedbyasuppressionsubtractivehybridization(SSH)experiment.Pleasefindthebesthit(s)oftheunknownsequenceinthepublicdatabaseandpredictitspotentialfunction.

>anunknownsequence

CCTCGGAGATCTTCATGGGGGGCAAGAGCACCATCGTGCTgCACAACACCTGCGAGGACTCGCTCCTCGCTGCACCCATCATTCTTGATCTGGTGCTCCTGGCGGAGCTCAGCACCAGGATTCAGCTGAAGGCCGAGGGAGAGGTAAGAGTCTGACGAGATATGTTGCTAGTCTACTCTGTAGTCGAGATATACTTTGGGAGCCAAACTGAAGATTTCGCTGCTCCACTTGCATTTGTGCAGGACAAGTTCCATTCCTTCCATCCGGTTGCCACCATCCTGAGCTACCTCACCAAGGCACCCCTGGTAAGAAACAATTCTCGACTGTTTGCTCTAAATAACCTATAGATAAATAAAGACGATTAACTGACGTGCCACTGAATTCCTCTGTTAACAGGTTCCTCCTGGCACGCCGGTGGTGAACGCCCTGGCGAAGCAAAGGGCGATGCTGGAGAACATCATGAGGGCGTGTGTCGGCCTGGCGCCCGAAAACAACATGATCCTGGAGTACAAGTGAGGAGCGTGGCCCAAGCTCGCGGAGCCGAGAGCGACCGTACGTACGTAGCAAGTGGCGAGGGGCGACGGGAGGGCAGGACGAAGAAGAAGGCGAGATCGGCTGTGGAATTATTTGGCGGCTTGTCTTTAGTTTCCTTTGCGAATCTTTCCCTGGTTAAGTTTACCCCAGTGAGTGTGTGTCCTTGCGAGAAAAG

答:

进入NCBI做blast,具体网址为http:

//blast.ncbi.nlm.nih.gov/Blast.cgi,选择Blastx,将上述序列复制到查询框中,参数选择默认参数,直接Blast,得到最佳联配结果为Inositol-3-phosphatesynthase[Dichantheliumoligosanthes]。

进入EMBL做blast,具体网址为http:

//www.ebi.ac.uk/Tools/sss/ncbiblast/,选择Blastx,将上述序列复制到查询框中,参数选择默认参数,直接Blast,得到最佳联配结果为Inositol-3-phosphatesynthase。

根据两处的联配结果可以推测这个未知序列可能的功能与Inositol-3-phosphatesynthase相同。

 

 

 

 

 

#3

Usedynamicprogrammingmethod,theNeedleman-Wunschalgorithm,toperformglobalalignmentofthesequences:

P1=HEAGAWGHEP

P2=EPAWHEAEAG

Scoringsystem:

BLOSUM50scoringmatrixwithgappenalty8.

BLOSUM50(partial)

A

E

G

H

P

W

A

5

-1

0

-2

-1

-3

E

6

-3

0

-1

-3

G

8

-2

-2

-3

H

10

-2

-3

P

10

-4

W

15

答:

具体每一步动态规划的计算过程如下图所示,以黄颜色突出的部分表示达到最优联配所需经过的每一步。

P1

H

E

A

G

A

W

G

H

E

P

P2

0

-8

-16

-24

-32

-40

-48

-56

-64

-72

-80

E

-8

0

-2

-10

-18

-26

-34

-42

-50

-58

-66

P

-16

-8

-1

-3

-11

-19

-27

-36

-44

-51

-48

A

-24

-16

-9

4

-3

-6

-14

-22

-30

-38

-46

W

-32

-24

-17

-4

1

-6

9

1

-7

-15

-23

H

-40

-22

-24

-12

-6

-1

1

7

11

3

-5

E

-48

-30

-16

-20

-14

-7

-4

-1

7

17

9

A

-56

-38

-24

-17

-20

-9

-10

-4

-1

9

16

E

-64

-46

-32

-25

-20

-17

-12

-12

-4

5

8

A

-72

-54

-40

-27

-25

-15

-20

-12

-12

-3

4

G

-80

-72

-48

-35

-19

-23

-18

-12

-14

-11

-4

最终可以得到最佳的联配方式如下所示,其中下划线表示空位

P1:

HEAGAWGH_EP_

P2:

_EP_AWHEAEAG

Score:

-8+6-1-8+5+15-2+0-8+6-1-8=-4

#4

Pleasefindgenesinagenomicsegmentofbamboo(Download).

答:

打开如下网址,Organism选择Monocotplants(因为里面没有竹子对应的选项),然后运行在线程序,最后得到结果如下,它给出了可能的基因及它们编码的蛋白质的碱基序列。

FGENESH2.6PredictionofpotentialgenesinMonocotgenomicDNA

Time:

SunNov604:

05:

562016

Seqname:

testsequence

Lengthofsequence:

49600

Numberofpredictedgenes10:

in+chain2,in-chain8.

Numberofpredictedexons22:

in+chain10,in-chain12.

Positionsofpredictedgenesandexons:

Variant1from1,Score:

299.051538

GStrFeatureStartEndScoreORFLen

1-PolA67770.44

1-1CDSo6884-70574.336884-7057174

1-TSS8311-1.78

2+TSS17591-4.18

2+1CDSf17742-1781116.4117742-1781069

2+2CDSl19834-2079287.8619836-20792957

2+PolA216490.44

3+TSS21801-7.58

3+1CDSf22009-2208519.3322009-2208375

3+2CDSi22583-226523.6522584-2265269

3+3CDSi23070-231456.5623070-2314475

3+4CDSi23236-2335318.3723238-23351114

3+5CDSi24144-242338.3724145-2423187

3+6CDSi24306-243816.4724307-2438175

3+7CDSi24523-246505.2624523-24648126

3+8CDSl24731-248008.3624732-2480069

3+PolA250060.44

4-PolA26777-1.06

4-1CDSl27135-2801959.8927135-28019885

4-2CDSf28097-2850440.0828097-28504408

4-TSS28623-6.38

5-PolA309640.44

5-1CDSl30993-311778.8830993-31175183

5-2CDSi31212-31431-7.3931213-31431219

5-3CDSf31504-315488.1531504-3154845

5-TSS31608-1.28

6-PolA333640.44

6-1CDSo33766-339544.4233766-33954189

6-TSS34021-3.18

7-PolA34094-1.06

7-1CDSo34700-3497517.2734700-34975276

7-TSS35444-6.08

8-PolA358480.44

8-1CDSo36075-3645820.7536075-36458384

8-TSS37019-5.38

9-PolA40341-1.06

9-1CDSo40879-410679.5140879-41067189

9-TSS41777-5.68

10-PolA43349-1.06

10-1CDSl44280-445457.1744280-44543264

10-2CDSf46131-4668633.9646132-46686555

10-TSS46908-8.18

Predictedprotein(s):

>FGENESH:

[mRNA]11exon(s)6884-7057174bp,chain-

ATGGGGGTGAATATGAAGGGTAAGCAGCACATGCCGCGGCCATGTGCGTCGGTGGTTCAC

TGGTTCAGTTTCCACGTCCACGAGTGGCCTCGCACTGTCGATAGCGATCGAATGAACGTT

CTTTGCTGCTGCACGGCGGGAGCTTCTCCGGAACAGTCAGGGCTGATTGGTTAG

>FGENESH:

11exon(s)6884-705757aa,chain-

MGVNMKGKQHMPRPCASVVHWFSFHVHEWPRTVDSDRMNVLCCCTAGASPEQSGLIG

>FGENESH:

[mRNA]22exon(s)17742-207921029bp,chain+

ATGCGCCGGGTAGCGCTGTTGCTGCTGCTCGTCTGCGCGGCGGCGCGCGCCGCCGCGGTC

GTCACCGACGGGCTTCTTCCGAACGGCAACTTCGAGGATGGCCCGCCCAAGTCGGCGCTG

GTGAACGGCACTGTGGTGTCGGGCGCCAACGCCATCCCTAGCTGGGAGACCTCCGGCTTC

GTGGAGTACATCGAGTCGGGGCACAAGCAGGGCGACATGCTCCTGGTGGTGCCCCAGGGC

GCCCACGCCGTGCGCCTGGGCAACGAGGCCTCCATCCGGCAGCGCCTCTCCGTCACCCGG

GGCGCCTACTACTCCATCACCTTCAGCGCGGCGCGCACCTGCGCGCAGGCCGAGCGCCTC

AACGTCTCCGTGTCCCCCGAGTGGGGCGTCCTCCCGATGCAGACCATCTACGGCAGCAAC

GGGTGGGACTCGTACGCCTGGGCCTTCAAGGCCAAGCTGGACACGGTGACGCTCGTCCTC

CACAACCCCGGCGTCGAGGAGGACCCGGCCTGCGGCCCGCTCATCGACGGCGTCGCCATC

CGGGCCCTGTACCCGCCCACGCTGGCCCGCGGCGGCAACATGCTCAAGAACGGCGGCTTC

GAGGAGGGGCCCTACTTTTTACCCAACGCGTCGTGGGGCGTGCTCGTGCCGCCCAACATC

GAGGACGACCACTCCCCGCTCCCGGCCTGGATGATCGTGTCCTCCAAGGCCGTCAAGTAC

GTGGACGCCGCGCACTTTAAGGTCCCCAGGGCGCGGCGCGCCGTGGAGCCTGGTGGCCCC

GGGGAGGGAAGCGGCTGGTGCAGGAGGTGGCGCCACCGTGCGGTGGAGCTACCACCCTGG

CCTTCGCCGTGGGGGACGCCGCCGACGGGTGCGAGGGGTCGCATGGTGGGGCCGAGGCGT

ACACCGGCGCGGCCCACCCGTGAAGGTGGGCCGTACGAGTCCCAAGGGGACGGGAACTTC

CTTTTTTCTTCTTCACGGCCATCGCCAGCCGCACCCGGGTCGTGTTCCAGAGCACCTTCT

ACCACATGA

>FGENESH:

22exon(s)17742-20792342aa,chain+

MRRVALLLLLVCAAARAAAVVTDGLLPNGNFEDGPPKSALVNGTVVSGANAIPSWETSGF

VEYIESGHKQGDMLLVVPQGAHAVRLGNEASIRQRLSVTRGAYYSITFSAARTCAQAERL

NVSVSPEWGVLPMQTIYGSNGWDSYAWAFKAKLDTVTLVLHNPGVEEDPACGPLIDGVAI

RALYPPTLARGGNMLKNGGFEEGPYFLPNASWGVLVPPNIEDDHSPLPAWMIVSSKAVKY

VDAAHFKVPRARRAVEPGGPGEGSGWCRRWRHRAVELPPWPSPWGTPPTGARGRMVGPRR

TPARPTREGGPYESQGDGNFLFSSSRPSPAAPGSCSRAPSTT

>FGENESH:

[mRNA]38exon(s)22009-24800705bp,chain+

ATGCGGCTGCTCCTGCTCCTCCTCGCCGGCGCCGCCGCCCGCGCCTCCGACGACCCCTTC

CTCTCCGGCGGACGGCGGCGCTCCCCAATCAGCAGACGGTGGACTACCCCAGCTTCAAGC

TCGTCATCGTCGGCGATGGTGGCACAGTCGTCTCTGCATCTTGTAGGCAAAACCACCTTT

GTGAAGAGGCATCTGACTGGTGAGTTTGAGAAGAAGTATGAACCCACCATTGGTGTTGAG

GTTCATCCCCTGGACTTCTACACCAACCGCGGGAAGATCCGGTTCTACTGCTGGGACACT

GCAGGGCAGGAGAAGTTTGGTGGGCTCAGGGATGGATACTACGTCCATGGACAGTGTGGG

ATCATTATGTTTGATGTAACCTCACGGCTGAGTTACAAGAATGTTCCAACTTGGCACCGT

GATTTATCCAGGGTCTGTGACAACATCCCAATTGTGCTTTGTGGGAACAAGGTCGACGTG

AAGAACAGGCAGGTCAAGGCAAAGCAGCAACCTATTTATTGGACGTGGGTAAACCAACCC

CTTTTTTGTTGTGACAGTGATGCCAATCTCCACTTTGTTGAAAGCCCTGCTCTCGTTCCT

CCAGATGTCACAATTGACATGGTCGCCCAGCAGCAGCATGAAGCTGAGCTGTTAATCGCT

GTAGCCCAACCACTGCCTGATGATGACGATGACCTCATCGAGTAG

>FGENESH:

38exon(s)22009-24800234aa,chain+

MRLLLLLLAGAAARASDDPFLSGGRRRSPISRRWTTPASSSSSSAMVAQSSLHLVGKTTF

VKRHLTGEFEKKYEPTIGVEVHPLDFYTNRGKIRFYCWDTAGQEKFGGLRDGYYVHGQCG

IIMFDVTSRLSYKNVPTWHRDLSRVCDNIPIVLCGNKVDVKNRQVKAKQQPIYWTWVNQP

LFCCDSDANLHFVESPALVPPDVTIDMVAQQQHEAELLIAVAQPLPDDDDDLIE

>FGENESH:

[mRNA]42exon(s)27135-285041293bp,chain-

ATGCGGATCAGGAAAGGGAGTCATGTGGAGGTGTGGACGCAGGACGCGGCGTCGCCGGTG

GGCGCGTGGCGCGTCGGGGAGGTCACCTGGGGCAACGGCCACTCGTACACCATGCGGTGG

CACGACGGCGGCGGCGAGGTCTCCGGCCGCATCTCGAGGAAGTCGGTCCGCCCCCGCCCG

CCGCCCGCCCCCGTGCCGCGGGACCTCGACGCCGGGGACATGGTCGAGGTGTTCGACCAC

GACGACTGCCTCTGGAAGTGCGCCGAGGTCAAGGGCGCCGCCGCCGACGACGACCGCCGC

TTCGTCGTCAAGGTCGTCGGCGCCACCAATGTCCTGACGGTCCCGCCGCAGAGGCTCCGC

ATCCGGCAGGTTCTCAGGGACGACGACGTCTGGGTCGCGCTCCACAAGAGCTCGTTTCCT

GACACCTCGCCGTGGTTCTTTGCTTCTCAGGACAACCAGATCGCCGTCCCTAGCGCGACG

CCGCCGTTCCACGCCTACGGCGGAGGCGCTGGCATGGGCATCGGCAGAACCAAAGGCGGC

CATAAGCCCATGGCGCCAGGCTTCACGCCGCTGCTGCAGAAGAGGAGCCCGCTGCTGCAG

AAGAGAAGCTTCGGTATGCTGGGTTCGAGCACAATAACCCCCAATGGCAAGAGATTCGAC

GACACCGCCAAGAGGATTTGTGCCAAGGAAGAGCCCAGATATGAAGTAGAAGTGGTCGTC

CCAAACGTGCGCCTGAACAAGCAAGACGAGATGAGCGGCGAAGATGTTGACGTGCTTGGG

ACACGCAGTGATTCCGATGATGATCATCATCAGCAGCAGCAGCAGCACGAGGACGAGGAT

GACGATGACGATAGTGATGATTCTGCATCATCATCCTCGGATGATGACAGCAGCAGTGAC

AGCAGTAACAGCGACAGCAGAACCAGGAGCACCGGAGCCGGCAAGAATTGCACGGCAGCT

CTCGCAAGCAGGCCTTGTAACGATCAGAAGGCCGATCAGCTGCAACCCAGCGAGAAAGAA

CATCGTGACGACATATCTGAATCGCATCACGAGACCCTGAACGATGAGAAGGCGGCGGTG

GTGCAGGAACACATCCACCGTCTGGAGCTGGAGGCCTACACTAATCTGATGAAGGCGTTC

CATGCATGTGGCAAAGCGCTGAGCTGGGAGAAGGCCGAACTGCTCACTGACCTCCGCGTG

CATCTCCATATCTCTAACGATGAGCACCTGCGGGTGCTTAACATGATCTTGAACCGCAAG

GGCAGATTTGGAGGATCACATGCAAATTCTTAA

>FGENESH:

42exon(s)27135-28504430aa,chain-

MRIRKGSHVEVWTQDAASPVGAWRVGEVTWGNGHSYTMRWHDGGGEVSGRISRKSVRPRP

PPAPVPRDLDAGDMVEVFDHDDCLWKCAEVKGAAADDDRRFVVKVVGATNVLTVPPQRLR

IRQVLRDDDVWVALHKSSFPDTSPWFFASQDNQIAVPSATPPFHAYGGGAGMGIGRTKGG

HKPMAPGFTPLLQKRSPLLQKRSFGMLGSSTITPNGKRFDDTAKRICAKEEPRYEVEVVV

PNVRLNKQDEMSGEDVDVLGTRSDSDDDHHQQQQQHEDEDDDDDSDDSASSSSDD

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > PPT模板 > 商务科技

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1