用R语言做非参数文档格式.docx

资源描述

用R语言做非参数文档格式.docx

《用R语言做非参数文档格式.docx》由会员分享，可在线阅读，更多相关《用R语言做非参数文档格式.docx（20页珍藏版）》请在冰豆网上搜索。

用R语言做非参数文档格式.docx

——FriedmanSupersmoother

模型：

——非参数密度估计

——非参数回归模型

——时间序列的半参数模型

——Paneldata的半参数模型

——QuantileRegression

三、不同的模型形式

1、线性模型linearmodels

2、Nonlinearinvariables

3、Nonlinearinparameters

四、数据转换Powertransformation（对参数方法）

IntheGLMframework,modelsareequallyprone（倾向于）tosomemisspecification（不规范）fromanincorrectfunctionalform.

Itwouldbeprudent（谨慎的）totestthattheeffectofanyindependentvariableofamodeldoesnothaveanonlineareffect.Ifitdoeshaveanonlineareffect,analystsinthesocialscienceusuallyrelyonPowerTransformationstoaddressnonlinearity.

[ADD:

检验方法见SanfordWeisberg.AppliedLinearRegression（ThirdEdition）.AJohnWiley&

Sons,Inc.,Publication.（本科的应用回归分析课教材）]

----------------------------------------------------------------------------

第二章NonparametricDensityEstimation

非参数密度估计

一、三种方法

1、直方图Hiatogram

2、Kerneldensityestimate

3、Knearest-neighborsestimate

二、Histogram对直方图的一个数值解释

Supposex1,…xN–f（x）,thedensityfunctionf（x）isunknown.

Onecanusethefollowingfunctiontoestimatef（x）

【与x的距离小于h的所有点的个数】

三、Kerneldensityestimate

Bandwidth:

Windowwidth:

2h.

1、Kernelfunction的条件

ThekernelfunctionK（.）isacontinuousfunction,symmetric（对称的）aroundzero,thatintegrates（积分）tounityandsatisfiesadditionalboundedconditions:

（1）K（）issymmetricaround0andiscontinuous;

（2）

;

（3）Either

（a）K（z）=0if|z|>

=z0forz0

（b）|z|K（z）à

0as

（4）

where

isaconstant.

2、主要函数形式

3、置信区间

其中，

4、窗宽的选择

实际应用中，

。

其中，s是样本标准差，iqr是样本分位数级差（interquartilerange）

四、Knearest-neighborsestimate

五、R语言部分

da<

-read.table（"

PSID.txt"

header=TRUE）

lhwage<

-da$lhwage

#***bandwidth相等，核函数不同***

den1<

-density（lhwage,bw=0.45,kernel="

epan"

）

den2<

gauss"

den3<

biwe"

den4<

rect"

plot（den4,lty=4,main="

xlab="

LogHourlyWage"

ylab="

Kerneldensityestimates"

lines（den3,lty=3,col="

red"

lines（den2,lty=2,col="

green"

lines（den1,lty=1,col="

blue"

#***bandwidth不相等，核函数也不同***

den5<

-density（lhwage,bw=0.545,kernel="

den6<

-density（lhwage,bw=0.246,kernel="

den7<

-density（lhwage,bw=0.646,kernel="

den8<

-density（lhwage,bw=0.214,kernel="

plot（den8,lty=4,main="

lines（den7,lty=3,col="

lines（den6,lty=2,col="

lines（den5,lty=1,col="

----------------------------------------------------------------------------

第三章smoothingandlocalregression

一、简单光滑估计法SimpleSmoothing

1、LocalAveraging局部均值

按照x排序，将样本分成若干部分（intervalsor“bins”）；

将每部分x对应的y值的均值作为f（x）的估计。

三种不同方法：

（1）相同的宽度（equalwidthbins）：

uniformlydistributed.

（2）相同的观察值个数（equalno.ofobservationsbins）：

k-nearestneighbor.

（3）移动平均（movingaverage）

K-NN：

等窗宽：

2、kernelsmoothing核光滑

其中，

二、局部多项式估计LocalPolynomialRegression

1、主要结构

局部多项式估计是核光滑的扩展，也是基于局部加权均值构造。

——localconstantregression

——locallinearregression

——lowess（Cleveland,1979）

——loess（Cleveland,1988）

【本部分可参考：

Takezana（2006）.IntroductiontoNonparametricRegression.（P1853.7andP1953.9）

ChambersandHastie（1993）.StatisticalmodelsinS.（P312ch8）】

2、方法思路

（1）对于每个xi，以该点为中心，按照预定宽度构造一个区间；

（2）在每个结点区域内，采用加权最小二乘法（WLS）估计其参数，并用得到的模型估计该结点对应的x值对应y值，作为y|xi的估计值（只要这一个点的估计值）；

（3）估计下一个点xj；

（4）将每个y|xi的估计值连接起来。

【R操作

library（KernSmooth）#函数locpoly（）

library（locpol）#locpol（）;

locCteSmootherC（）

library（locfit）#locfit（）

#weightfunciton:

kernel=”tcub”.And“rect”,“trwt”,“tria”,“epan”,“bisq”,“gauss”

】

3、每个方法对应的估计形式

（1）变量个数p=0,localconstantregression（kernelsmoothing）

min

（2）变量个数p=1,locallinearregression

（3）Lowess（LocalWeightedscatterplotsmoothing）

p=1:

【还有个加权修正的过程，这里略，详见原书或者PPT】

（4）Loess（Localregression）

p=1,2:

（5）Friedmansupersmoother

symmetrick-NN,usinglocallinearfit,

varyingspan,whichisdeterminedbylocalCV,

notrobusttooutliers,fasttocompute

supsmu（）inR

三、模型选择

需要选择的内容：

（1）窗宽thespan；

（2）多项式的度thedegreeofpolynomialforthelocalregressionmodels；

（3）权重函数theweightfunctions。

【其他略】

四、R语言部分

library（foreign）

library（SemiPar）

library（mgcv）

jacob<

jacob.txt"

###############################################################################

#第一部分，简单的光滑估计

#1、KernelDensityEstimation

#IllustrationofKernelConcepts

#DefiningtheWindowWidth

attach（jacob）

x0<

-sort（perotvote）[75]

diffs<

-abs（perotvote-x0）

which.diff<

-sort（diffs）[120]

#ApplyingtheTricubeWeight

#...Tricubefunction

tricube<

-function（z）{

ifelse（abs（z）<

1,（1-（abs（z））^3）^3,0）

}

#...

-seq（0,1,by=.1）

tricube（a）

#Figure2.5

plot（range（perotvote）,c（0,1）,xlab="

PerotVote（%）"

ylab="

TricubeWeight"

type='

bty="

abline（v=c（x0-which.diff,x0+which.diff）,lty=2）

abline（v=x0）

xwts<

-seq（x0-which.diff,x0+which.diff,len=250）

lines（xwts,tricube（（xwts-x0）/which.diff）,lty=1,lwd=1）

points（x.n,tricube（（x.n-x0）/which.diff）,cex=1）

###########################################################################

#2、KernelSmoothing

Figure2.6

par（mfrow=c（3,1））

plot（perotvote,chal.vote,pch="

cex=1.95,

xlab="

Challenger'

sVoteShare（%）"

main="

Bandwidth=4"

lines（ksmooth（perotvote,chal.vote,bandwidth="

））

cex=.65,

Bandwidth=8"

lines（ksmooth（perotvote,chal.vote,kernel="

box"

bandwidth="

）,lty=1）

Bandwidth=12"

12"

#*******Kernelsmoothing中选取box和normal核函数的比较，带宽相等

cex=.65,xlab="

main="

）,lty=1）

normal"

）,lty=2,col="

）

##################################################################################

#第二部分，LPR模型

#DataPrepForLocalAverageRegressionStep-by-Step

cong<

-as.data.frame（jacob[,2:

3]）

-cong[order（cong$perotvote）,1:

-as.matrix（cong$chal.vote）

-as.matrix（cong$perotvote）

-length（y）

-x[75]

-abs（x-x0）

x.n<

-x[diffs<

=which.diff]

y.n<

-y[diffs<

weigh=tricube（（x.n-x0）/which.diff）

mod<

-lm（y.n~x.n,weights=weigh）

#Figure2.7

plot（x,y,type="

abline（v=c（x0-which.diff,x0+which.diff）,lty=2）

points（x[diffs>

which.diff],y[diffs>

which.diff],pch=16,cex=1,col=gray（.80））

points（x[diffs<

=which.diff],y[diffs<

=which.diff],cex=.85）

abline（mod,lwd=2,col=1）

text（27.5,50,expression（paste（"

FittedValueofyat"

x[0]）））#这里expression的用法比较有意思

arrows（25,47,15,37,code=2,length=.10）

#################################################################################

#2、NowPuttingItTogetherForLocalRegressionDemonstration.

#OLSFitforComparison

ols<

-lm（chal.vote~perotvote,data=jacob）

#Theloessfit

model.loess<

-loess（chal.vote~perotvote,data=jacob,span=0.5）

#***默认设置degree=2，family=gauss,tricube加权***

-length（chal.vote）

x.loess<

-seq（min（perotvote）,max（perotvote）,length=n）

y.loess<

-predict（model.loess,data.frame（perotvote=x.loess））#得到预测值便于比较

#Thelowessfit

model.lowess<

-lowess（chal.vote~perotvote,data=jacob,f=0.5）

#***默认设置robustlineartricube加权***

x.lowess<

y.lowess<

-predict（model.lowess,data.frame（perotvote=x.lowess））#得到预测值便于比较

#Figure2.8

ylab="

Challengers'

VoteShare（%）"

xlab="

VoteforPerot（%）"

lines（x.loess,y.loess）

lines（x.lowess,y.lowess）

abline（ols）

legend（15,20,c（"

Loess"

Lowess"

OLS"

）,lty=c（1,2,1）,bty="

cex=.8）

#3、lowess中不同robust的比较

m1.lowess<

-lowess（perotvote,chal.vote,f=0.5,iter=0）

#***没有进行第二步的robust加权估计***

m2.lowess<

-lowess（perotvote,chal.vote,f=0.5）

#***默认iter=3，要进行3次robust加权估计***

m0.loess<

-loess（chal.vote~perotvote,data=jacob,

span=0.5,degree=1,family="

symm"

iter

展开阅读全文