计算生物学期末作业张伟伟综述Word格式.docx
《计算生物学期末作业张伟伟综述Word格式.docx》由会员分享,可在线阅读,更多相关《计算生物学期末作业张伟伟综述Word格式.docx(11页珍藏版)》请在冰豆网上搜索。
)。
把这个方法估计出来的结果与ABSOLUTE的结果比较一下,看看相关性有多大?
3)如果再考虑到肿瘤细胞容易发生拷贝数的变化呢?
这个模型应该怎么改进?
作业要求:
1.把分析过程和计算结果尽可能详细地写出来;
2.把计算的程序附上(python做这个是最合适的);
3.多画图来展示你的中间结果;
4.放假之前把程序和结果交给我。
1.LiE,BestorTH,JaenischR:
TargetedmutationoftheDNAmethyltransferasegeneresultsinembryoniclethality.Cell1992,69(6):
915-926.
2.LiE,BeardC,JaenischR:
RoleforDNAmethylationingenomicimprinting.Nature1993,366(6453):
362-365.
3.FangF,HodgesE,MolaroA,DeanM,HannonGJ,SmithAD:
Genomiclandscapeofhumanallele-specificDNAmethylation.ProcNatlAcadSciUSA2012,109(19):
7332-7337.
4.PanningB,JaenischR:
RNAandtheepigeneticregulationofXchromosomeinactivation.Cell1998,93(3):
305-308.
5.FeinbergAP,CuiH,OhlssonR:
DNAmethylationandgenomicimprinting:
insightsfromcancerintoepigeneticmechanisms.SeminCancerBiol2002,12(5):
389-398.
6.EhrlichM:
DNAmethylationincancer:
toomuch,butalsotoolittle.Oncogene2002,21(35):
5400-5413.
7.JonesPA,BaylinSB:
Thefundamentalroleofepigeneticeventsincancer.NatRevGenet2002,3(6):
415-428.
8.DasPM,SingalR:
DNAmethylationandcancer.JClinOncol2004,22(22):
4632-4642.
9.RobertsonKD:
DNAmethylationandhumandisease.NatRevGenet2005,6(8):
597-610.
10.BeckS,RakyanVK:
Themethylome:
approachesforglobalDNAmethylationprofiling.TrendsGenet2008,24(5):
231-237.
11.JavierreBM,FernandezAF,RichterJ,Al-ShahrourF,Martin-SuberoJI,Rodriguez-UbrevaJ,BerdascoM,FragaMF,O'
HanlonTP,RiderLGetal:
ChangesinthepatternofDNAmethylationassociatewithtwindiscordanceinsystemiclupuserythematosus.GenomeRes2010,20
(2):
170-179.
12.HansenKD,TimpW,BravoHC,SabunciyanS,LangmeadB,McDonaldOG,WenB,WuH,LiuY,DiepDetal:
Increasedmethylationvariationinepigeneticdomainsacrosscancertypes.NatGenet2011,43(8):
768-775.
13.ZhengX,ZhaoQ,WuH-J,LiW,WangH,MeyerCA,QinQA,XuH,ZangC,JiangP:
MethylPurify:
tumorpuritydeconvolutionanddifferentialmethylationdetectionfromsingletumorDNAmethylomes.GenomeBiol2014,15:
419.
问题一:
假设在纯的细胞系中,单个位点的甲基化水平或者是0,或者是1
对于cancer和normal成对的样本,利用秩和检验选取有信息的cpg位点,建立线性统计模型
,其中
表示肿瘤细胞的纯度,
表示cancer中第i个位点的甲基化水平,
表示normal中第i个位点的甲基化水平,
表示cancer中第i个位点的甲基化水平;
当
时,
,否则
。
应用此模型对于每一个位点均得到纯度
,再取均值即可到整个样本的纯度,将此方法应用每一个样本,将得到的结果与absolute作比较,得到下图结果:
图1:
线性统计模型估计肿瘤纯度与absolute估计纯度的相关系数
程序:
setwd("
/mnt/Storage/home/zhengxq/wwzhang/homework/processed/paired_cancer"
)
##readcancerandmakeadataframe
dir<
-"
/mnt/Storage/home/zhengxq/wwzhang/homework/luaddata/cancerpair/"
file.names<
-list.files(dir)
n<
-length(file.names)
cpg<
-read.delim(paste(dir,file.names[1],sep="
"
),header=T)[,1][-1]
-as.vector.factor(cpg)
cancer<
-c()
for(iin1:
n){
dat<
-read.delim(paste(dir,file.names[i],sep="
),header=T)
a<
-dat[,2][-1]
-as.vector.factor(a)
-as.numeric(a)
cancer<
-cbind(cancer,a)
colnames(cancer)[i]<
-strsplit(file.names[i],"
."
fixed=TRUE)[[1]][6]
}
rownames(cancer)<
-cpg
save(cancer,file="
cancer.RData"
##readnormalandmakeadataframe
dir1<
/mnt/Storage/home/zhengxq/wwzhang/homework/luaddata/normalpair/"
file<
-list.files(dir1)
m<
-length(file)
normal<
m){
-read.delim(paste(dir1,file[i],sep="
normal<
-cbind(normal,a)
colnames(normal)[i]<
-strsplit(file[i],"
rownames(normal)<
save(normal,file="
normal.RData"
#########################################################################
##选取秩和检验中pvalue<
10^-6的有差异的CPG位点
/home/users/wwzhang/wwzhang/luadprocessed/purity_wilcox_paired_mean"
load("
/home/users/wwzhang/wwzhang/luadprocessed/normal.RData"
/home/users/wwzhang/wwzhang/luadprocessed/cancer.RData"
tnormal<
-t(na.omit(normal))
tcancer<
-t(na.omit(cancer))
index<
-intersect(colnames(tcancer),colnames(tnormal))
tcancer1<
-tcancer[,index]
tnormal1<
-tnormal[,index]
-ncol(tcancer1)
value<
value<
-cbind(value,wilcox.test(tcancer1[,i],tnormal1[,i],paired=T)$p.value)
colnames(value)[i]<
-colnames(tcancer1)[i]
-which(value<
10^-6)
posi<
-colnames(value)[index]
mcancer<
-cancer[posi,]
mnormal<
-normal[posi,]
#############################################################
###########利用线性统计模型求cancer的纯度####################
purity<
-function(x,y,i){
s1<
-x[,i]
d1<
-y[,i]
n<
-length(s1)
chundu<
for(jin1:
if(s1[j]>
d1[j]){
p<
-(s1[j]-d1[j])/(1-d1[j])
}else{
-(d1[j]-s1[j])/d1[j]
}
-append(chundu,p)
pingjun<
-mean(chundu)
return(pingjun)
pur<
29){
-purity(mcancer,mnormal,i)
pur<
-cbind(pur,a)
colnames(pur)[i]<
-colnames(mcancer)[i]
save(pur,file="
purity_wilcox_paired_mean_0.000001.RData"
###########计算线性模型估计的纯度与absolute估计纯度的相关性#################
##loadabsolutpurity
absolute<
-read.delim("
nature.tumor_purity"
header=T,sep="
-absolute[,3]
-as.data.frame(purity)
rownames(purity)<
-absolute[,1]
purity_wilcox_paired_mean_0.000001.RData"
-t(pur)
rownames(pur)[i]<
-substr(rownames(pur)[i],1,12)
name<
-as.vector.factor(absolute[,1])
-intersect(rownames(pur),rownames(purity))
purnew<
-as.data.frame(pur[index,])
puritynew<
-as.data.frame(purity[index,])
rownames(puritynew)<
-index
cor<
-cor(purnew,puritynew)
-round(as.vector(cor),3)
tag<
-paste0("
R="
cor)
pdf("
purity_mean.pdf"
-lm(puritynew[,1]~purnew[,1])
plot(purnew[,1],puritynew[,1],xlab="
luadpurity"
ylab="
absolutepurity"
abline(m)
text(0.2,0.6,tag)
dev.off()
问题二:
对于没有癌旁没有正常组织的样本,即没有对照组的cancer采取如下处理:
(1)选取没有癌旁正常组织的所有cancer将其与已有的normal做秩和检验,选取pvalue<
10^-6的位点。
(2)将已有的normal取peak作为该位点的甲基化水平
(3)将所有的cancer与
(2)得到的一个样本应用问题一的线性统计模型,求得没有癌旁正常组织的cancer的纯度,再将其与absolute估计出来的纯度作比较,结果如下:
图2:
没有癌旁正常组织的cancer纯度与absolute估计纯度的相关系数
########loadnormalsample
/mnt/Storage/home/zhengxq/wwzhang/homework/cancer_unpaired"
mnormal_wilcox.test_paired_10^-6.RData"
##################################################################
#将所有的normal按位点取peak作为该位点的甲基化水平
-nrow(mnormal)
methy<
alpha=density(mnormal[i,])
x=alpha$x
y=alpha$y
a=x[which.max(y)]
methy<
-cbind(methy,a)
colnames(methy)[i]<
-rownames(mnormal)[i]
#readcancersample
/mnt/Storage/home/zhengxq/wwzhang/homework/luaddata/cancer/"
),header=T,skip=1)[,1]
-as.character(cpg)
##read427unpairedcancer
dat<
),header=T,skip=1)
a<
-dat[,2]
colnames(cancer)[i]<
##make427cancersampledataframematchnormal
-na.omit(cancer)
-intersect(rownames(cancer),rownames(t(methy)))
cancerlast<
-cancer[index,]
methylast<
-t(methy)[index,]
###################################################################
#computecancerpurity
F.purity<
-y
return(purity)
m=ncol(cancerlast)
-F.purity(cancerlast,methylast,i)
-colnames(cancerlast)[i]
luad-purity_mean.RData"
###########################################################################
#luadandabsolutecorrelation
luad-purity_mean.RData"
427){