上海交大医学院统计学上机重点.docx
《上海交大医学院统计学上机重点.docx》由会员分享,可在线阅读,更多相关《上海交大医学院统计学上机重点.docx(36页珍藏版)》请在冰豆网上搜索。
上海交大医学院统计学上机重点
H0:
不同组男女构成比相等,π1=π2;H1:
。
。
。
。
统计结论,P>0.05,按α=0.05水平不拒绝H0,无统计学差异,可认为。
。
。
相等。
程序解释:
MEAN过程常用的主要统计量关键词包括:
N(样本量)SUM(和)MEAN(均数)RANGE(全距)MIN(最小值)MAX(最大值)STD(标准差)CV(变异系数)VAR(方差)STDERR(标准误)LCLM(总体均数可信区间下限)UCLM(上限)T(检验μ=0时的T值)PRT(t值对应的双侧概率)
datastudent;
inputsex$ageheightweightbirthyymmdd10.;
index=weight/height**2;
cards;
male181.7471.31981-3-21
female19.54.21982-12-4
female181.6258.91981-5-6
male181.7875.21980-1-4
female181.6261.81981-7-12
male191.7672.61981-9-23
;
procprintdata=student;
varsexageheightweightindexbirth;
formatbirthmmddyy10.;
run;
procmeansdata=student;
varageheightweightindex;
run;
从已建立的SAS数据集中读入数据建立新的SAS数据集
libnamecourse'd:
\data';
datacourse.student;
setstudent;
run;
dataa;
setcourse.student;
procprint;
run;
Datab;
Seta;
Run;
数据集的拆分
datamale;
setstudent;
ifsex='male'thenoutput;
run;
datafemale;
setstudent;
ifsex='male'thendelete;
run;
datamalefemale;
setstudent;
ifsex='male'thenoutputmale;
elseoutputfemale;
run;
dataheight;
setstudent;
keepsexageheightindex;
run;
procprint;run;
dataweight;
setstudent;
dropheightbirth;
run;
procprint;run;
多个SAS数据集纵向合并
dataone;
inputname$pidgroupage;
cards;
Liming111154
Wangli112249
Xiaoli113134
;
datatwo;
inputname$piddrug$sex;
cards;
Yaohong211A1
Zhaohong212B2
Mixue213A2
;
datatotal;
setonetwo;
procprintdata=one;
procprintdata=two;
procprintdata=total;
run;
多个SAS数据集横向合并
dataone;
inputpidsexage;
cards;
101154
102245
103242
105134
;
datatwo;
inputpidweightheight;
cards;
10445162
10264171
10354165
10151160
;
procsortdata=one;
bypid;
procsortdata=two;
bypid;
datatotal;
mergeonetwo;
bypid;
procprintdata=total;
run;
Means过程计算各统计量(std标准差)
datashg;
inputx@@;
cards;
108.097.6103.4101.6104.498.5110.5103.8109.7
109.8
104.599.5104.0103.997.2106.3106.2107.6108.3
97.6
102.7103.7107.6103.2103.6103.3102.8102.3102.2
103.3
101.2107.5106.3109.799.5107.4103.4106.6105.7
107.4
103.0109.6106.4107.3100.6112.3100.5101.998.8
99.7
104.3110.2105.395.2105.8105.2106.1103.6106.6
105.1
105.5113.5107.7106.8106.2109.899.7107.9104.8
103.9
106.8106.4108.3106.5103.3107.7106.2100.4102.6
102.1
110.6112.2110.2103.7102.3112.1105.4104.2105.7
104.4
102.8107.8102.5102.3105.8103.7103.1101.6106.5
100.0
103.2109.3105.8106.1104.9105.9105.3103.799.6
106.2
102.5108.1106.1108.399.8108.3104.0100.6112.6103.7
;
procmeansdata=shgnmeanstdcvminmax;
varx;
run;
分组计算各统计量“结果保留三位小数”
dataa;
inputgroupVAVB1@@;
cards;
11.81.421.71.112.21.531.91.222.51.012.71.6
22.31.322.80.933.01.112.61.412.41.221.91.3
32.90.813.21.733.11.522.61.933.51.633.31.5
;
procsortdata=a;bygroup;
procmeansmeanstdmaxminmaxdec=3;
bygroup;
varVAVB1;
run;
计算几何均数(频数表)
dataa;
inputfx@@;
y=log10(x);
cards;
14388161332
2164912842561512
;
procmeansnoprint;
vary;
freqf;
outputout=bmean=meany;
run;
datac;
setb;
meanx=10**(meany);
run;
procprint;
run;
程序解释:
FREQ<变量名>:
规定该变量的值为分析变量的频数。
OUTPUT:
指定MEANS过程产生的统计量的输出数据集名。
统计量关键字=<新变量名列>···:
指明在输出数据集中想要的统计量,且指定这些统计量对应的新变量名。
univariate过程输出3种数据图(茎叶图、盒式图、正态概率图),频数表(变量值Value频数Count百分数Cell累计百分数Cum),正态性检验结果
datashg;
inputx@@;
cards;
108.097.6103.4101.6104.498.5110.5103.8109.7109.8
104.599.5104.0103.997.2106.3106.2107.6108.397.6
102.7103.7107.6103.2103.6103.3102.8102.3102.2103.3
101.2107.5106.3109.799.5107.4103.4106.6105.7107.4
103.0109.6106.4107.3100.6112.3100.5101.998.899.7
104.3110.2105.395.2105.8105.2106.1103.6106.6105.1
105.5113.5107.7106.8106.2109.899.7107.9104.8103.9
106.8106.4108.3106.5103.3107.7106.2100.4102.6102.1
110.6112.2110.2103.7102.3112.1105.4104.2105.7104.4
102.8107.8102.5102.3105.8103.7103.1101.6106.5100.0
103.2109.3105.8106.1104.9105.9105.3103.799.6106.2
102.5108.1106.1108.399.8108.3104.0100.6112.6103.7
;
procunivariatedata=shgplotfreqnormal;
varx;
run;
程序解释:
1、TestsforNormality为正态性检验,检验结果P>0.05,可认为是正态分布。
2、UncorrectedSS为平方和;correctedSS为离均差平方和;InterquartileRange四分位数间距。
总体均数的区间估计(计算总体均数的置信区间,99%的置信区间)
datashg;
inputx@@;
cards;
108.097.6103.4101.6104.498.5110.5103.8109.7109.8
104.599.5104.0103.997.2106.3106.2107.6108.397.6
102.7103.7107.6103.2103.6103.3102.8102.3102.2103.3
101.2107.5106.3109.799.5107.4103.4106.6105.7107.4
103.0109.6106.4107.3100.6112.3100.5101.998.899.7
104.3110.2105.395.2105.8105.2106.1103.6106.6105.1
105.5113.5107.7106.8106.2109.899.7107.9104.8103.9
106.8106.4108.3106.5103.3107.7106.2100.4102.6102.1
110.6112.2110.2103.7102.3112.1105.4104.2105.7104.4
102.8107.8102.5102.3105.8103.7103.1101.6106.5100.0
103.2109.3105.8106.1104.9105.9105.3103.799.6106.2
102.5108.1106.1108.399.8108.3104.0100.6112.6103.7
;
procmeansdata=shgnmeanstdclmalpha=0.01;
varx;
run;
T检验:
(1)样本均数与总体均数比较的T检验(总体均数72;t(检验μ=0时的T值);prt(t值对应的双侧概率))
datamb;
inputx@@;
d=x-72;
cards;
7473687575828069
7274837271747679
6773817067707869
707267748066
;
procmeansdata=mbmeanstdstderrtprt;
varxd;
run;
主要看d的t;prt
也可以使用univariate过程
procunivariatedata=mbnormal;
varxd;
run;
(2)配体设计样本的T检验
datach4_7;
inputafterbefore@@;
d=after-before;
cards;
70.5564.29
88.6064.07
68.4445.88
61.6445.23
64.7350.40
74.6861.59
69.1551.85
60.5160.13
65.5964.29
69.0451.93
;
procmeansdata=ch4_7meanstdstderrtprt;
vard;
run;
(3)成组设计两样本均数比较的t检验(ttest过程进行成组t检验;classgroup表示分组变量为group;X为血红蛋白的增加量)
(3)-1
datahb;
inputgroupx@@;
cards;
126132125122120128124119129117134121120123127
221223218224223219216222220225223217215226222
;
procttestdata=hb;
classgroup;
varx;
run;
结果解释:
先看方差齐性检验(EqualityofVariances),P>0.05,方差齐;然后看tValue和对应的P值,P<0.05,因此按α=0.05水准拒绝H0,故可认为两组贫血儿童的血红蛋白的增加量不同,新药组儿童的血红蛋白增加量均数比常规药组大。
(3)-2:
变量变换后成组比较的t检验(抗体滴度—求对数)
dataktdd;
inputgroupx@@;
y=log10(x);
cards;
150130140160160135170120170135140150125
240230225210225230235215220240215230220
;
procttest;
classgroup;
vary;
run;
单因素三水平的方差分析
1、使用循环语句建立SAS数据集:
(@@非常重要)
datadat5_1;
dogroup=1to3;
inputn;
doi=1ton;
inputx@@;
output;
end;
end;
cards;
15
4010352520153515-5302570654550
15
50204555201580-10105751060456030
10
6030100852055453077105
;
run;
2、正态性检验:
procsortdata=dat5_1;
bygroup;
run;
procunivariatedata=dat5_1normal;
varx;
bygroup;
run;
3、方差齐性检验:
使用Levene检验,程序包含在glm和ANOVA过程中。
4、方差分析:
procglmdata=dat5_1;
classgroup;
modelx=group;
meansgroup/hovtest;
meansgroup;
run;
结果解释:
1用glm过程进行方差分析。
2首先用class语句指定分组变量,此为group。
3然后用model语句指定所用模型。
等号左边为因变量,右边为分组变量。
4MEANS关键词后面是分组变量名,后面跟着一个斜杠,接着是这种选择项。
hovtest做方差齐性检验(P>0.05方差齐)
4’、均数间的多重比较:
procglmdata=dat5_1;
classgroup;
modelx=group;
meansgroup/hovtest;
meansgroup/snkbondunnett('1');
meansgroup/snkalpha=0.01;
contrast'12vs3'group-0.5-0.51;
contrast'1vs2'group1-10;
run;
结果解释:
SAS中使用GLM过程步或ANOVA过程步中means语句后的选项来实现各种两两比较,程序中为几种不同的比较方法。
随机区组设计方差分析:
{4个种系(区组);3个处理}
datadat2;
doblock=1to4;
dotreat=1to3;
inputx@@;
output;
end;
end;
cards;
7686115
123885
4081103
123357
;
procglmdata=dat2;
classtreatblock;
modelx=treatblock/p;outputout=rR=RES;
meanstreatblock/snk;
run;
procunivariatedata=rnormal;varres;run;
结果解释:
“/p”要求输出预测值和残差;outputout将预测值和残差输到数据集r;RES为残差的变量名;normal为对残差进行正态性检验。
拉丁方设计方差分析:
datadat3;
doperson=1to5;
dostress=1to5;
inputcloth$x@@;
output;
end;
end;
cards;
B103A121C100D92E95
C102B129D98E124A115
D118C133E103A109B90
E99D122A99B84C100
A102E139B103C104D95
;
procanova;
classpersonstresscloth;
modelx=personstresscloth;
run;
quit;
相关分析:
A药在血中和尿中的半衰期:
1、建立数据集
datadat1;
inputx1x2;
cards;
9.97.9
11.28.9
9.48.5
8.49.4
14.812
12.411.5
13.114.5
13.412.3
11.29.2
9.511
10.78.3
9.28.5
;
run;
2、绘制散点图
procplotdata=dat1;
plotx1*x2='*'/haxis=by3vaxis=by3;
run;
过程解释:
使用过程步plot进行绘制散点图。
'*'定义散点的符号为*
haxisvaxis说明间隔的距离
*前为纵坐标,后为横坐标。
2、检验双变量的二元正态分布
procregdata=dat1;
modelx2=x1/p;outputout=rR=RES;
run;
procunivariatedata=dat1normal;varx1;run;
procunivariatedata=rnormal;varres;run;
过程解释:
Reg:
做回归方程估计,回归分析
Model因变量=自变量/p
对残差和X1做正态性检验,若两者的P值皆>0.05,表明符合二元正态分布。
3、进行相关分析
proccorrdata=dat1;
varx1;
withx2;
run;
过程解释:
使用过程步corr进行相关分析。
varx1;withx2;指定欲分析的相关变量。
0.72048是相关系数,0.0082是t检验的P值
回归分析:
datadat2;
inputxy;
cards;
18.03
314.97
519.23
727.83
936.23
;
procplotdata=dat2;
ploty*x='*';
run;
procregdata=dat2;
modely=x/p;
ploty*x;
run;
intercept截距为a(3.94300);3.46300为b
四格表卡方检验:
题:
西药治疗79例,有效63人;中药治疗54例,有效47人,问两种药物治疗有效率?
datadat1;
dor=1to2;
doc=1to2;
inputfreq@@;
output;
end;
end;
cards;
6316
477
;
run;
procfreqdata=dat1;
tablesr*c/chisq;
weightfreq;
run;
过程解释:
freq频数
过程步用FREQ,TABLES语句定义列表的格式:
行变量*列变量,斜杠后面是选择项,chisq表示要卡方检验。
Weight语句指定频数变量。
结果解释:
4行分别代表频数,总百分比,行百分比和列百分比
procfreqdata=dat1;
tablesr*c/chisqnopercentnocolexpected;
weightfreq;
run;
过程解释:
nopercentnocol去掉总百分比和列百分比;expected计算每小格的理论频数。
K×2表卡方检验:
题目:
增加中西结合组68例,有效65,三种疗法是否有差异?
datadat2;
dor=1to3;
doc=1to2;
inp