1、)。把这个方法估计出来的结果与ABSOLUTE的结果比较一下,看看相关性有多大?3)如果再考虑到肿瘤细胞容易发生拷贝数的变化呢?这个模型应该怎么改进?作业要求:1. 把分析过程和计算结果尽可能详细地写出来;2. 把计算的程序附上(python做这个是最合适的);3. 多画图来展示你的中间结果;4. 放假之前把程序和结果交给我。1. Li E, Bestor TH, Jaenisch R: Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell 1992, 69(6):915
2、-926.2. Li E, Beard C, Jaenisch R: Role for DNA methylation in genomic imprinting. Nature 1993, 366(6453):362-365.3. Fang F, Hodges E, Molaro A, Dean M, Hannon GJ, Smith AD: Genomic landscape of human allele-specific DNA methylation. Proc Natl Acad Sci U S A 2012, 109(19):7332-7337.4. Panning B, Jae
3、nisch R: RNA and the epigenetic regulation of X chromosome inactivation. Cell 1998, 93(3):305-308.5. Feinberg AP, Cui H, Ohlsson R: DNA methylation and genomic imprinting: insights from cancer into epigenetic mechanisms. Semin Cancer Biol 2002, 12(5):389-398.6. Ehrlich M: DNA methylation in cancer:
4、too much, but also too little. Oncogene 2002, 21(35):5400-5413.7. Jones PA, Baylin SB: The fundamental role of epigenetic events in cancer. Nat Rev Genet 2002, 3(6):415-428.8. Das PM, Singal R: DNA methylation and cancer. J Clin Oncol 2004, 22(22):4632-4642.9. Robertson KD: DNA methylation and human
5、 disease. Nat Rev Genet 2005, 6(8):597-610.10. Beck S, Rakyan VK: The methylome: approaches for global DNA methylation profiling. Trends Genet 2008, 24(5):231-237.11. Javierre BM, Fernandez AF, Richter J, Al-Shahrour F, Martin-Subero JI, Rodriguez-Ubreva J, Berdasco M, Fraga MF, OHanlon TP, Rider LG
6、 et al: Changes in the pattern of DNA methylation associate with twin discordance in systemic lupus erythematosus. Genome Res 2010, 20(2):170-179.12. Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG, Wen B, Wu H, Liu Y, Diep D et al: Increased methylation variation in epigenetic do
7、mains across cancer types. Nat Genet 2011, 43(8):768-775.13. Zheng X, Zhao Q, Wu H-J, Li W, Wang H, Meyer CA, Qin QA, Xu H, Zang C, Jiang P: MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes. Genome Biol 2014, 15:419.问题一:假设在纯的细胞系中,单个位点的甲
8、基化水平或者是0,或者是1对于cancer和normal成对的样本,利用秩和检验选取有信息的cpg位点,建立线性统计模型 ,其中表示肿瘤细胞的纯度,表示cancer中第i个位点的甲基化水平,表示normal中第i个位点的甲基化水平,表示cancer中第i个位点的甲基化水平 ;当时,否则。应用此模型对于每一个位点均得到纯度,再取均值即可到整个样本的纯度,将此方法应用每一个样本,将得到的结果与absolute作比较,得到下图结果:图1:线性统计模型估计肿瘤纯度与absolute估计纯度的相关系数程序:setwd(/mnt/Storage/home/zhengxq/wwzhang/homework/
9、processed/paired_cancer)#read cancer and make a dataframedir-/mnt/Storage/home/zhengxq/wwzhang/homework/luaddata/cancerpair/file.names-list.files(dir)n-length(file.names)cpg-read.delim(paste(dir,file.names1,sep=),header=T),1-1-as.vector.factor(cpg)cancer-c()for(i in 1:n) dat-read.delim(paste(dir,fil
10、e.namesi,sep=),header=T) a-dat,2-1-as.vector.factor(a)-as.numeric(a) cancer-cbind(cancer,a) colnames(cancer)i-strsplit(file.namesi,.,fixed=TRUE)16rownames(cancer)-cpgsave(cancer,file=cancer.RData# read normal and make a dataframedir1/mnt/Storage/home/zhengxq/wwzhang/homework/luaddata/normalpair/file
11、-list.files(dir1)m-length(file)normalm)-read.delim(paste(dir1,filei,sep= normal-cbind(normal,a) colnames(normal)i-strsplit(filei,rownames(normal)save(normal,file=normal.RData#选取秩和检验中pvalue10-6的有差异的CPG位点/home/users/wwzhang/wwzhang/luadprocessed/purity_wilcox_paired_meanload(/home/users/wwzhang/wwzhan
12、g/luadprocessed/normal.RData/home/users/wwzhang/wwzhang/luadprocessed/cancer.RDatatnormal-t(na.omit(normal)tcancer-t(na.omit(cancer)index-intersect(colnames(tcancer),colnames(tnormal)tcancer1-tcancer,indextnormal1-tnormal,index-ncol(tcancer1)value value-cbind(value,wilcox.test(tcancer1,i,tnormal1,i,
13、paired=T)$p.value) colnames(value)i-colnames(tcancer1)i-which(value10-6)posi-colnames(value)indexmcancer-cancerposi,mnormal-normalposi,#利用线性统计模型求cancer的纯度#purity-function(x,y,i) s1-x,i d1-y,i n-length(s1) chundud1j) p-(s1j-d1j)/(1-d1j) else-(d1j-s1j)/d1j -append(chundu,p) pingjun-mean(chundu) return
14、(pingjun)pur29)-purity(mcancer,mnormal,i) pur-cbind(pur,a) colnames(pur)i-colnames(mcancer)isave(pur,file=purity_wilcox_paired_mean_0.000001.RData#计算线性模型估计的纯度与absolute估计纯度的相关性#load absolut purityabsolute-read.delim(nature.tumor_purity,header=T,sep=-absolute,3-as.data.frame(purity)rownames(purity)-ab
15、solute,1purity_wilcox_paired_mean_0.000001.RData -t(pur) rownames(pur)i-substr(rownames(pur)i,1,12)name-as.vector.factor(absolute,1)-intersect(rownames(pur),rownames(purity)purnew-as.data.frame(purindex,)puritynew-as.data.frame(purityindex,)rownames(puritynew)-indexcor-cor(purnew,puritynew)-round(as
16、.vector(cor),3)tag-paste0(R=,cor)pdf(purity_mean.pdf-lm(puritynew,1purnew,1)plot(purnew,1,puritynew,1,xlab=luadpurity,ylab=absolute purityabline(m)text(0.2, 0.6, tag)dev.off()问题二:对于没有癌旁没有正常组织的样本,即没有对照组的cancer采取如下处理:(1)选取没有癌旁正常组织的所有cancer将其与已有的normal做秩和检验,选取pvalue10-6的位点。(2)将已有的normal取peak作为该位点的甲基化水平
17、(3)将所有的cancer与(2)得到的一个样本应用问题一的线性统计模型,求得没有癌旁正常组织的cancer的纯度,再将其与absolute估计出来的纯度作比较,结果如下:图2:没有癌旁正常组织的cancer纯度与absolute估计纯度的相关系数#load normal sample/mnt/Storage/home/zhengxq/wwzhang/homework/cancer_unpairedmnormal_wilcox.test_paired_10-6.RData#将所有的normal按位点取peak作为该位点的甲基化水平-nrow(mnormal)methy alpha = dens
18、ity(mnormali,) x = alpha$x y = alpha$y a= xwhich.max(y) methy-cbind(methy,a) colnames(methy)i-rownames(mnormal)i# read cancer sample/mnt/Storage/home/zhengxq/wwzhang/homework/luaddata/cancer/),header=T,skip=1),1-as.character(cpg)# read 427 unpaired cancer dat),header=T,skip=1)a-dat,2colnames(cancer)
19、i# make 427 cancer sample dataframe match normal-na.omit(cancer)-intersect(rownames(cancer),rownames(t(methy)cancerlast-cancerindex,methylast-t(methy)index,# compute cancer purityF.purity-y return(purity)m=ncol(cancerlast)-F.purity(cancerlast,methylast,i)-colnames(cancerlast)iluad-purity_mean.RData# luad and absolute correlationluad-purity_mean.RData 427)
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1