ImageVerifierCode 换一换
格式:DOCX , 页数:35 ,大小:558.05KB ,
资源ID:8639257      下载积分:3 金币
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.bdocx.com/down/8639257.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(第二章 回归分析.docx)为本站会员(b****5)主动上传,冰豆网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知冰豆网(发送邮件至service@bdocx.com或直接QQ联系客服),我们立即给予删除!

第二章 回归分析.docx

1、第二章 回归分析第二节 回归分析2.1 regress命令b = regress(y,X)b,bint = regress(y,X)b,bint,r = regress(y,X)b,bint,r,rint,stats = regress(y,X)b,bint,r,rint,stats = regress(y,X,alpha) bint是置信度为100(1- alpha)%的b的区间估计。它是一个p2的矩阵。缺失情况下,alpha为0.05。r是残差,即实际值与估计值之差,它一个n1的矩阵。rint置信度为100(1- alpha)%的r的区间估计。如果rint的第i个区间不包括0,那么(xi,

2、yi)为野值。bint的算法:第i行的区间为:r的计算: rint的计算,参考2.2节 stats包括四项1、 可决系数2、 F值 3、 F值对应的p值注意这里是右侧检验的P值。4、 残差的方差例子:年 次12345678910销售量Y百件10101513142018241923居民人均收入X2百元578991010121315单价X3元2325434354z=10 10 15 13 14 20 18 24 19 235 7 8 9 9 10 10 12 13 152 3 2 5 4 3 4 3 5 4;z1=z;y=z1(:,1);X=ones(size(z,2),1) z1(:,2,3);

3、b,bint,r,rint,stats = regress(y,X,0.05)b = 4.5875 1.8685 -1.7996bint = -1.3713 10.5463 1.2309 2.5060 -3.5327 -0.0664r = -0.3307 -2.2681 -0.9361 0.5941 -0.2054 2.1265 1.9261 2.3896 -0.8797 -2.4162rint = -4.0996 3.4382 -6.1857 1.6495 -4.9828 3.1106 -2.9358 4.1241 -4.7787 4.3678 -2.0518 6.3048 -2.3755

4、6.2277 -1.1643 5.9434 -4.9030 3.1435 -5.1237 0.2912stats =0.8793 25.5037 0.0006 3.8685下面讨论各值是怎么计算的:b的计算:b=inv(X*X)*X*yb = 4.5875 1.8685 -1.7996r的计算:r=y-X*br = -0.3307 -2.2681 -0.9361 0.5941 -0.2054 2.1265 1.9261 2.3896 -0.8797 -2.4162rint的计算: rint等于: 在本例中:引用2.2节的studresstudres = -0.2075 -1.3690 -0.5

5、470 0.3980 -0.1062 1.2035 1.0588 1.5900 -0.5171 -2.1103rint等于:rint=r-tinv(0.975,7)*r./studres r+tinv(0.975,7)*r./studresrint = -4.0996 3.4382 -6.1857 1.6495 -4.9828 3.1106 -2.9358 4.1241 -4.7787 4.3678 -2.0518 6.3048 -2.3755 6.2277 -1.1643 5.9434 -4.9030 3.1435 -5.1237 0.2912也可按此计算:rint=r-tinv(0.975

6、,7)*sqrt(s2_i).*sqrt(1-leverage) r+tinv(0.975,7)*sqrt(s2_i).*sqrt(1-leverage)rint = -4.0996 3.4382 -6.1857 1.6495 -4.9828 3.1106 -2.9358 4.1241 -4.7787 4.3678 -2.0518 6.3048 -2.3755 6.2277 -1.1643 5.9434 -4.9030 3.1435 -5.1237 0.2912SE平方的计算:sum(r.2)/7ans =3.8685bint的计算:bint=b+tinv(0.975,7)*sqrt(diag

7、(3.8685*inv(X*X) b-tinv(0.975,7)*sqrt(diag(3.8685*inv(X*X)bint = 10.5463 -1.3713 2.5060 1.2309 -0.0664 -3.5327R2的计算:R2=1-sum(r.2)/(var(y)*(length(y)-1)R2 =0.8793F值的计算:F=(R2/2)/(1-R2)/(10-3)F = 25.5037F值对应的P值P=1-fcdf(25.5037,2,7)P = 6.1045e-004还可绘制残差图rcoplot(r,rint)每条线的上下两端对应于rint,中间的圆卷点对应于r。如果某条线不通过

8、中间的白线(即X轴),那么所对应的(xi,yi)为野值。这个图中所有线条都通过X轴。预测:假设在未来五年,居民人均收入以4.5%的速度递增,而单价以1%的速度递减。x1(1)=15; x2(1)=4; for i=1:5x1(i+1)=1.045*x1(i);x2(i+1)=0.99*x2(i);y(i+1)=4.5875+1.8685*x1(i+1)-1.7996*x2(i+1);endyf=x1;x2;yyf = Columns 1 through 4 15.0000 15.6750 16.3804 17.1175 4.0000 3.9600 3.9204 3.8812 0 26.7498

9、 28.1391 29.5869 Columns 5 through 6 17.8878 18.6927 3.8424 3.8040 31.0961 32.6693最后一行为未来五年的预测值(0除外)。2.2 regstats 线性回归诊断2.2.1命令:regstats(responses,data,model)responses:因变量,y它是n1的列向量。n为观察值个数。data:自变量,它是nm的矩阵,m为自变量个数,注意它不包括全为1的列向量。model:model can be one of the following strings linear:Includes constan

10、t and linear terms (default).包括常数项和各变量。interaction:Includes constant, linear, and cross product terms. 如自变量有两个时,X1,X2,则包括常数项、X1,X2,还有X1X2。quadratic:Includes interactions and squared terms. 如自变量有两个时,X1,X2,则包括常数项、X1,X2,还有X1X2、X12、X22。purequadratic:Includes constant, linear, and squared terms。如自变量有两个时,

11、X1,X2,则包括常数项、X1,X2,还有X12、X22。regstats(responses,data,model) 此命令将打开一个用户界面,包括以下20个统计量:可参见市场调查与分析柯惠新 丁立宏 编著 中国统计出版社 2000.3 第十二章 统计手册 茆诗松 主编 科学出版社 2003.1 第十章 统计建模与R软件薛毅 陈立萍 清华大学出版社 2007.4 第六章参考网站:(1)QR Decomposition (Q) 矩阵论 程云鹏P206X=QR ,X包括全为1的列向量。X为np的矩阵。Q,R = qr(X,0) Q是np的矩阵,且满足Q*Q = I(2)QR Decomposit

12、ion (R) Q,R = qr(X,0) R是pp的上三角形矩阵。(3)Regression Coefficients beta = R(Q*y) 即beta = inv(R)*(Q*y)把X=QR代入beta=inv(X*X)*X*y 即得上式。(4)Fitted Values of the Response X*beta=X* inv(X*X)*X*y(5)Residuals (6)Mean Squared Error (7)Covariance Matrix of Estimated Coefficients (8)Hat (Projection) Matrix(帽子矩阵)hatmat

13、 = Q*Q yhat = hatmat*yhatmat为nn矩阵 将X=QR代入yhat= X*beta=X* inv(X*X)*X*y得:yhat= Q*Q*y hatmat为投影矩阵。 (9)Leverage(中心化杠杆值)leverage = diag(hatmat) = diag(Q*Q),它是n1的列向量,n个值取值范围为0,1,第i值是度量第i的观察值在模型中的作用大小,如果第i个值越大,则在模型中的作用越大。 用leverage是寻找强影响点的一个办法。所谓强影响点是指在模型中的作用特别大的点,就是说删除该点和不删除该点所得到的回归系数会有很差异的点。理想的中心化杠杆值是每个杠

14、杆值都具有相同的影响力,即所有的杠杆值都接近p/n,如果某个观测点的杠杆值大于等于2p/n,就认为它是一个强影响点。(10)Delete-1 Variance 它是除去第i个数据点后误差的方差的估计。它是n1的列向量。s2_i = (n-p)*mse - r.*r./(1-h)./(n-p-1)1) n is the number of observations.2) p is the number of unknown coefficients. 3) mse is the mean squared error. 4) r is the vector of residuals. 5) h i

15、s the leverage vector.(11)Delete-1 Coefficients 它是把第j个观察值删除后,所得回归系数矩阵。它为pn的矩阵,它的第j列对应的列向量是删除第i个观察值所得的回归系数。b_i(:,j) = beta - Rinv*(Q(j,:) .* r(j)./(1-h(j)1) Rinv is the inverse of the R matrix.2) r is the vector of residuals.3) h is the leverage vector.(12)Standardized Residuals standres = r ./ sqrt(

16、mse*(1-h)1) r is the vector of residuals.2) mse is the mean squared error.3) h is the leverage vector.Standres为n1的列向量。用它可以诊断异常点,异常点是指明显远离主体数据的观察点,表现为标准化残差(内学生化残差)过大的观测量,一般认为标准化残差绝对值大于2或3,则认为是异常点。经典假设满足时,Standres(i) i=1、2、3n,可近似看成独立同分布的,均服从标准正态分布N(0,1)的随机变量。如果大约有95%的点落在2内,且没有任何明显的变化趋势,说明回归的基本假定满足,模型对

17、于数据的拟合效果较好。(13)Studentized Residuals studres = r ./ sqrt(s2_i*(1-h)1) r is the vector of residuals.2) s2_i is the delete-1 variance.3) h is the leverage vector.studres(i) i=1、2、3n,经典假设满足时,服从自由度为n-p的t分布。给定显著性水平时,则认为是异常点。当n-p30时,一般认为观察值所对应的学生化残差绝对值大于2或3,则认为是异常点。(14)Scaled Change in Regression Coeffici

18、ents The scaled change in regression coefficients is a p-by-n matrix. Each column contains the scaled change in the estimated coefficients, beta, caused by deleting the corresponding observation.d = sqrt(diag(Rinv*Rinv);dfbetas(:,j) = (beta - b_i(:,j) ./ (sqrt(s2_i(j).*d(j)1) Rinv is the inverse of

19、the R matrix.2) b_i is the matrix of delete-1 coefficients.3) s2_i is the vector of delete-1 variances.它是计算当某个观测点被排除后的回归系数的标准变化值,一般认为标准变化值大于的点可能就是强影响点。(15)Change in Fitted Values The change in fitted values is an n-by-1 vector. Each element contains the change in a fitted value caused by deleting th

20、e corresponding observation.dffit = r .* (h./(1-h)1) r is the vector of residuals.2) h is the leverage vector.表示删除某观察值后预测值的变化值。(16)Scaled Change in Fitted Values The scaled change in fitted values is an n-by-1 vector. Each element contains the change in a fitted value caused by deleting the correspo

21、nding observation, scaled by the standard error.dffits = studres .* sqrt(h./(1-h)1) studres is the vector of studentized residuals.2) h is the leverage vector.它是计算当某个观察点被排除后的预测值的预测值的标准变化值,一般认为标准变化值的绝对值大于的点可能就是强影响点。(17)Change in Covariance covr = 1 ./ (n-p-1+studres.*studres)./(n-p).p).*(1-h)它是n1的列向量

22、。1) n is the number of observations.2) p is the number of unknown coefficients.3) studres is the vector of studentized residuals.4) h is the leverage vector.(18)Cooks Distance cookd = r .* r .* (h./(1-h).2)./(p*mse) r is the vector of residuals.1) h is the leverage vector.2) mse is the mean squared

23、error.3) p is the number of unknown coefficients.cookd是n1的列向量,如果第i个值大于0.5,则第i个观察值可能为强影响点。(19)Students t statistics 1) beta - Regression coefficient estimates2) se - Standard errors for the regression coefficient estimates3) t - t statistics for the regression coefficient estimates, each one for a te

24、st that the corresponding coefficient is zero4) dfe - Degrees of freedom for error5) pval - p-values for each t statistic, which is calculated by the following code:beta = R(Q*y)se = sqrt(diag(covb)t = beta ./ sedfe = n-ppval = 2*(tcdf(-abs(t), dfe)(20)F statistic1) sse - Error sum of squares2) ssr

25、- Regression sum of squares3) dfe - Error degrees of freedom4) dfr - Regression degrees of freedom5) f - F statistic value, for a test that all regression coefficients other than the constant term are zero6) pval - p-value for the F statistic, which is calculated by the following code:sse = norm(r).

26、2ssr = norm(yfit - mean(yfit).2dfe = n-pdfr = p-1f = (ssr/dfr) / (sse/dfe)pval = 1 - fcdf(f, dfr, dfe)例子:年 次12345678910销售量Y百件10101513142018241923居民人均收入X2百元578991010121315单价X3元2325434354z=10 10 15 13 14 20 18 24 19 235 7 8 9 9 10 10 12 13 152 3 2 5 4 3 4 3 5 4;z1=z;y=z1(:,1);X=z1(:,2,3);regstats(y,X,

27、linear)在用户界面里全部选中得:Q = -0.3162 -0.5449 -0.1902 -0.3162 -0.3179 0.0288 -0.3162 -0.2043 -0.4207 -0.3162 -0.0908 0.6204 -0.3162 -0.0908 0.2478 -0.3162 0.0227 -0.2017 -0.3162 0.0227 0.1710 -0.3162 0.2497 -0.3554 -0.3162 0.3633 0.3131 -0.3162 0.5903 -0.2132R = -3.1623 -30.9903 -11.0680 0 8.8091 1.8163 0

28、0 2.6835beta = 4.5875 1.8685 -1.7996covb = 6.3503 -0.3247 -0.7947 -0.3247 0.0727 -0.1108 -0.7947 -0.1108 0.5372yhat = 10.3307 12.2681 15.9361 12.4059 14.2054 17.8735 16.0739 21.6104 19.8797 25.4162r = -0.3307 -2.2681 -0.9361 0.5941 -0.2054 2.1265 1.9261 2.3896 -0.8797 -2.4162mse = 3.8685leverage = 0

29、.4331 0.2019 0.3187 0.4932 0.1696 0.1412 0.1297 0.2887 0.3300 0.4939sum(leverage)ans = 3.0000hatmat = Columns 1 through 8 0.4331 0.2677 0.2913 0.0315 0.1024 0.1260 0.0551 0.0315 0.2677 0.2019 0.1528 0.1467 0.1360 0.0870 0.0977 0.0104 0.2913 0.1528 0.3187 -0.1424 0.0143 0.1802 0.0234 0.1985 0.0315 0.1467 -0.1424 0.4932 0.2620 -0.0272 0.2040 -0.1

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1