遗传多态性知识汇总Word文件下载.docx

资源描述

遗传多态性知识汇总Word文件下载.docx

《遗传多态性知识汇总Word文件下载.docx》由会员分享，可在线阅读，更多相关《遗传多态性知识汇总Word文件下载.docx（18页珍藏版）》请在冰豆网上搜索。

遗传多态性知识汇总Word文件下载.docx

由不同基因型与环境共同作用所产生的生物体（人类）可观测的物理或生理性状称为表现型（phenotype）。

限制性片段长度多态性（restrictionfragmentlengthpolymorphism.RFLP）是第一代的遗传标记；

可变数目的串联重复（variablenumberoftandemrepeat.VNTR）是第二代遗传标记；

其中重复单位为2-6个核苷酸称为微卫星或短串联重复；

6-12个核苷酸称为小卫星。

Polymorphismsaredefinedasfrequent（occurringingreaterthan1%ofthepopulation）variationsinthehumanDNAsequence.Mostinvolveasinglebasepairsubstitution,knownassinglenucleotidepolymorphisms

（1）,althoughmorecomplexvariationsarealsorecognised.SNPsaresinglebasepairpositionsingenomicDNAatwhichdifferentsequencealternatives（alleles）existinnormalindividualsinsomepopulation（s）,whereintheleastfrequentallelehasanabundanceof1%orgreater.Inprinciple,SNPscouldbebi-,tri-,ortetra-alleticpolymorphisms.Howere,inhumans,tri-alleticandtetra-alleticSNPsarerarealmosttothepointofnon-existence,andsoSNPsaresometimessimplyreferredtoasbi-allelicmarkers.

单核苷酸多态性（singlenucleotidepolymorphism.SNP）：

最早由美国麻省理工学院的人类基因组研究中心Lander于1996年提出，是不同个体基因组DNA序列内特定核苷酸位置上单个碱基的不同．是第三代遗传标记，任一SNP在群体中出现的频率应不小于1%，原则上SNP可以是双、三、四等位基因多态，在人类三、四等位基因的SNP很少甚至几乎不存在，因此SNP简单指双等位标记，双等位基因的SNP替换包括1个转换C\T（G\A）和3个颠换C\A（G\T）、C\G（G\C）、T\A（A\T），由于核苷酸的5-甲基胞嘧啶脱氢基反应相对比较频繁，使得四种SNPs在基因组中出现的频率不同，在生物体内约2/3是C/T（G/A）转换，并且多存在于非转录序列中。

据统计，人类基因组中3*109碱基中至少存在着1000万个SNPs位点，平均约1个SNP/1000bp。

与其他遗传标记（如限制性片段长度多态，短串联重复）的主要不同是不再以“长度”的差异作为检测的手段，而直接以序列的变异作为标记，具有高丰度、高度稳定性和易于自动化分析等独特的优势。

英文描述：

SNPmarkersarepreferredovermicrosatellitemarkersforassociationstudies,becauseoftheirhighabundancealongthehumangenome（SNPswithminorallelefrequency>

0.1occuronceevery600kb）（Wangetal.1998）,theirlowmutationrate,andtheaccessibilityofhigh-throughputgenotyping.ThepowerofassociationstudiesbasedonSNPsdependsnotonlyonthesamplesizeanddensityofthemarkermapbutalsoonmanyotherfactors,suchastheageandfrequencyofthediseasemutationsandSNPsandtheextentoflinkagedisequilibrium（LD）intheregion.

（2）

根据SNP在基因序列中所处的位置的不同，SNP位点可以分为几个大类。

大多数对基因的功能没有影响的SNPs,称为anonymousSNPs；

存在于基因内部的SNP位点则称为gene-basedSNPs，包括内含子、外显子和启动子中的单核苷酸多态性位点。

其中，存在于蛋白质编码序列中的SNP位点称为cSNPs或codingSNPs。

在cSNPs中，如果不改变所编码的氨基酸序列，这样的单核苷酸多态性称为synonymousSNPs；

如果SNP导致了氨基酸序列的改变，则称为non-synonymousSNPs。

发生在基因蛋白编码区的SNP，可能引起编码氨基酸的置换，导致蛋白功能的改变；

大多数SNPs发生在非编码区，启动子区域的SNP也许影响转录因子结合的能力，改变基因转录的速率或水平；

发生在5’上游区或3’下游区域的SNPs可能改变转录的mRNA的稳定性或增强子活性；

而内含子区域的SNPs的功能效应有待于进一步研究（3）。

检测SNP的方法多种多样，有直接测序法、PCR-RFLP法、单链构型多态分析法（singlestrandconformationpolymorphismanalysis，SSCP）、异源双链分析法（heteroduplexanalysis，HA）、变性梯度凝胶电泳分析法（denaturinggradientgelelectrophoresis，DGGE）、固相化学断裂法（solidphasechemicalcleavagemethod，spCCM）、等位基因特异性聚合酶链反应法（allele-specificPCR）、DNA芯片检测法和实时荧光定量PCR法等，均具有较高的特异性和敏感性，不同实验室可以根据研究目的和经费选择合适的检测方法。

2.单倍型（haplotype）

位于染色体上特定区域、相互关联、倾向于以整体模式遗传给后代的SNPs组合称作单倍型（haplotype），比拟为人类进化历史的“分子化石”。

在一段DNA内若存在n个SNP位点，则群体内理论上可能存在2n种单倍型，但针对每一个体来说只有2种单倍型。

单倍型构建方法：

实验方法目前有单分子稀释法（single-specificdilution）、AP-PCR（allele-specificPCR）、长插入克隆法（Long-insertcloning）与双倍型-单体型转化（diploid-to-haploidconversion）等；

统计算法有Clark算法、最大似然算法、贝叶斯算法。

3.单倍域（haplotypeblock）

根据基因组大范围内SNPs之间的连锁不平衡，能够用一个相对简单的模型来描述人类基因组的单倍型结构，即染色体上存在的连续的、稳定的、几乎没有被重组所打断的单倍型区域，称为单倍域（haplotypeblockorhaploblocks）。

Severalneighboring,tightlylinkedSNPsareinheritedtogetherandformahaplotypeblock.单倍域可能是遗传的最小单位，在极端情况下，它可以是一个单独的SNP或者是一整条染色体，重组事件频发的区域可将相邻的单倍域间隔开来。

3.1单倍域的定义：

①ahaplotypeblockisacontiguoussetofmarkersinwhichtheaverageD’（thestandardizedcoefficientofLD（4））isgreaterthansomepredeterminedthreshold.

②Gabrieletal（5）describedhumangenomecanbeparsedobjectivelyintohaplotypeblocks:

sizableregionsoverwhichthereislittleevidenceforhistoricalrecombinationandwithinwhichonlyafewcommonhaplotypesareobserved.basedonlinkagedisequilibrium（LD）,thatislargepairwise|D’|valuesbetweenthoseSNPpairswithinonehaploblock.

③Patiletal（6）definedhaplotypeblocksasaregionwithalargeproportion（>

80%）ofinferredcommonhaplotypes.basedontheconceptof“chromosomecoverage”,withahaplotypeblockcontainingaminimumnumberofSNPsthataccountforamajorityofcommonhaplotypesorareducedlevelofhaplotypediversity.

④Wangetal（7）furtherproposedexplicit“nohistoricalrecombination”asadefinitionforhaplotypeblocks,whichcanbetestedusingafour-gametetest.

⑤DingKetal（8）choosetodefinehaplotypeblocksbasedonLDwhenhaplotype-block-basedtSNPsselectionmethodswereemployed.TheLD-basedhaplotype-blockdefinitionrequiresthattheproportionofSNPpairswithstrongD’（absoluteD’≥0.70）mustaccountforatleast95%ofpairsofSNPs

3.2单倍域的算法及划分标准：

3.2.1基于连锁不平衡:

①GabrielCriteria（5）ofhaplotypeblockpartitioning：

vExcludeMAFofSNPsbelow0.05

v“strongLD”isdefinedthatiftheone-sidedupper95%confidenceboundonD’is>

0.98（thatis,consistentwithnohistoricalrecombination）andthelowerboundisabove0.7.

v“strongevidenceforhistoricalrecombination”pairsforwhichtheupperconfidenceboundonD’islessthan0.9.

Wedefinedahaplotypeblockasaregionoverwhichaverysmallproportion（<

5%）ofcomparisonsamonginformativeSNPpairsshowstrongevidenceofhistoricalrecombination.FractionofstrongLDinthesetwocategoriesmustbeatleast95%.[Weallowfor5%becausemanyforcesotherthanrecombination（bothbiologicalandartifactual）candisrupthaplotypepatterns,suchasrecurrentmutation,geneconversion,orerrorsofgenomeassemblyorgenotyping]

②TheblockdefinitionmethodbasedontheD'

measureofLD,employedbyGabrieletal,wasappliedtotheSNPgenotypedatathroughtheHaploViewsoftwarepackage（MJDalyandJCBarrett,WhiteheadInstitute,MA,USA）.Briefly,ablockwasdefinedasaregioninwhichlessthan5%ofSNPpairshadaD'

upperconfidenceboundlessthan0.9.Inaddition,blocksconsistingof2SNPscouldspanupto20kbandblocksof3or4SNPscouldspanupto30kb.Blockswerenotallowedtooverlap（9）

③Wangetal（7）furtherproposedexplicit“nohistoricalrecombination”asadefinitionforhaplotypeblocks.利用四配子检验法（four-gametetest,FGT）提出了单体域的算法：

首先对成对的SNPs进行四配子检验（检测到4个配子就表示曾经发生重组），将两两位点的四配子状态用矩阵表示，有4个配子出现计为1，否则为0；

单体域被定义为没有重组现象发生的一组有序SNP标记，也就是根据FGT的结果，只要配子数不超过3个，就不断累加SNP到一个域中，直到第k个位点出现4个配子而结束，位点k可作为另一个新单倍域的突变起始点。

FGT算法的优点之一就是无需预先设定域值，当样本量较大时与贪婪算法结果相似。

HaploviewFourgameterule:

Foreachmarkerpair,thepopulationfrequenciesofthe4possibletwo-markerhaplotypesarecomputed.Ifall4areobservedwithatleastfrequency0.01,arecombinationisdeemedtohavetakenplace.Blocksareformedbyconsecutivemarkerswhereonly3gametesareobserved.

3.2.2基于单体型多样性:

①Patiletal（6）definedhaplotypeblocksasaregionwithalargeproportion（>

80%）ofinferredcommonhaplotypes.提出了获得单体域近似分割的贪婪算法，首先考虑由连续SNPs形成的所有可能的单体域，然后从中选出一个单体域，使得该域中的SNP数目与所需最少的标签SNPs（用来区分的出现一次以上单体型）数目之比值达到最大，也就是用最少的标签SNP区分出最多的SNP；

每个SNP都被安排一个单体域中.所有单体域的大小与其在染色体上的顺序无关，且单体域没有绝对的边界。

Twocriteria:

（1）ineachblock,atleast80%oftheobservedhaplotypesarerepresentedmorethanonce;

and

（2）thetotalnumberoftagSNPsfordistinguishingatleast80%ofhaplotypesisassmallaspossible

②Zhangetal（10-11）提出了单体域分割的动态程序算法，算法的原理是使每个单体域中能代表域中大部分性质的标签SNPs达到最少，他们的算法已经被开发为程序HAPBLOCK（http:

//hto-b.usc.edu/msms/HapBlock/）。

尽管上述方法各具优点，但Walletal（12）指出更倾向于第一类方法，原因：

其一，使用D’直接检测历史性重组的发生看起来更符合单体域的定义；

其二，对于二倍体的遗传数据，两两配对的方法更容易应用；

最后，两两配对连锁不平衡的系数更易于可视化。

3.2.3其余划分标准

vhaplotypeblockboundarieswereinferredfromthephasedgenotypedata（probabilitythresholdforcorrectphasecallateachsite:

0.95）byD’confidencelimits（upperconfidencelimit>

0.97,lowerconfidencelimit>

0.70,fractionofinformativepairsinstrongLD:

0.95）usingHaploview（http:

//www.broad.mit.edu/personal/jcbarret/haploview/）

v所有两两SNP之间的D’值最小值>

0.9（13-14）

v所有两两SNP之间的r2值和D’值均等于1（15）

v所有两两SNP之间的r2值最小值>

0.8（16）

v95%的两两SNP之间的D’值最小值>

0.7（8）

Severalneighboring,tightlylinkedSNPsareinheritedtogetherandformahaplotypeblock,whichasahaploblockhasahigherdiscriminationpowerthantheindividualSNPswithintheblock.Candidatehaplotypeblockswereselectedfromthreemajorpopulations（Caucasian,EastAsian,andAfrican）usingthefollowingparameters:

maximummatchprobabilityreduction=0.85,linkagedisequilibrium（LD）r2≥0.7,maximumFst=0.06（17）,minimumnumberofSNPs=3,minimumheterozygosity=0.2,andminimumnumberofhaplotypes=3.（18）

4.标签SNP（taggerSNP）

对于一个连锁群来说其可能包含有很多SNP位点，但是只需用少数几个SNPs就足以特异性地鉴定出该连锁群的单体型模式，而这样的SNPs被称为标签单核苷酸多态性（tagsinglenucleotidepolymorphism，tSNPs），是基因组中具有代表性和特征性的SNP，是构建单倍型或进行关联分析所必需的一组遗传标记。

而仅通过少数SNP等遗传标记就可以识别单倍域中的大部分单倍型,这些遗传标记被称为单倍型标SNP,称为单倍型标签SNP（haplotypetagSNPhtSNP）（19）。

4.1tSNP和htSNP的区别

Thetwoterms,htSNPsandtSNPs,refertotwodifferentstrategies（8）forchoosingtheoptimalminimumsubsetofSNPsfromtheentiresetofSNPs.htSNPsareselectedbasedonthehaplotype-blockmodelofLDpatterninaregionofinterestandrepresentthecommonhaplotypesinferredfromtheoriginalsetofSNPs.Ontheotherhand,tSNPsareselectedbasedonmeasuresofassociation,suchthatatSNPpredictspartiallyorcompletelythestateofotherSNPs.

4.2挑选tSNP或htSNP方法分类

Eightmethodscanalsobeclassifiedashaplotypeblock-basedmethods:

Allcommonhaplotypes,Haplotypediversity,R2h（Coefficientofdetermination）,andEntropyandhaplotype-block-freemethods:

TagIT（Haplotyper2）,LDr2（basedonpairwiseLD）,PCA（principalcomponentanalysis）,andBEST（basedonsettheory）.

LDlevelisbasedonthefollowingc

展开阅读全文