SAS Data Analysis Examples robust regression.docx

上传人:b****6 文档编号:7837035 上传时间:2023-01-26 格式:DOCX 页数:9 大小:31.75KB
下载 相关 举报
SAS Data Analysis Examples robust regression.docx_第1页
第1页 / 共9页
SAS Data Analysis Examples robust regression.docx_第2页
第2页 / 共9页
SAS Data Analysis Examples robust regression.docx_第3页
第3页 / 共9页
SAS Data Analysis Examples robust regression.docx_第4页
第4页 / 共9页
SAS Data Analysis Examples robust regression.docx_第5页
第5页 / 共9页
点击查看更多>>
下载资源
资源描述

SAS Data Analysis Examples robust regression.docx

《SAS Data Analysis Examples robust regression.docx》由会员分享,可在线阅读,更多相关《SAS Data Analysis Examples robust regression.docx(9页珍藏版)》请在冰豆网上搜索。

SAS Data Analysis Examples robust regression.docx

SASDataAnalysisExamplesrobustregression

SASDataAnalysisExamples

RobustRegression

Robustregressionisanalternativetoleastsquaresregressionwhen dataiscontaminatedwithoutliersorinfluentialobservationsanditcanalsobeusedforthepurposeofdetectinginfluentialobservations.

Pleasenote:

Thepurposeofthispageistoshowhowtousevariousdataanalysiscommands.Itdoesnotcoverallaspectsoftheresearchprocesswhichresearchersareexpectedtodo. Inparticular,itdoesnotcoverdatacleaningandchecking,verificationofassumptions,modeldiagnosticsorpotentialfollow-upanalyses.

ThispagewasdevelopedusingSAS9.2. 

Introduction

Let'sbeginourdiscussiononrobustregressionwithsometermsinlinearregression.

Residual:

 Thedifferencebetweenthepredictedvalue(basedontheregressionequation)andtheactual,observedvalue.

Outlier:

 Inlinearregression,anoutlierisanobservationwithlargeresidual. Inotherwords,itisanobservationwhosedependent-variablevalueisunusualgivenitsvalueonthepredictorvariables. Anoutliermayindicateasamplepeculiarityormayindicateadataentryerrororotherproblem.

Leverage:

 Anobservationwithanextremevalueonapredictorvariableisapointwithhighleverage. Leverageisameasureofhowfaranindependentvariabledeviatesfromitsmean. Highleveragepointscanhaveagreatamountofeffectontheestimateofregressioncoefficients.

Influence:

 Anobservationissaidtobeinfluentialifremovingtheobservationsubstantiallychangestheestimateoftheregressioncoefficients. Influencecanbethoughtofastheproductofleverageandoutlierness. 

Cook'sdistance(orCook'sD):

Ameasurethatcombinestheinformationofleverageandresidualoftheobservation. 

Robustregressioncanbeusedinanysituationinwhichyouwoulduseleastsquaresregression. Whenfittingaleastsquaresregression,wemightfindsomeoutliersorhighleveragedatapoints. Wehavedecidedthatthesedatapointsarenotdataentryerrors,neithertheyarefromadifferentpopulationthanmostofourdata.Sowehavenocompellingreasontoexcludethemfromtheanalysis. RobustregressionmightbeagoodstrategysinceitisacompromisebetweenexcludingthesepointsentirelyfromtheanalysisandincludingallthedatapointsandtreatingallthemequallyinOLSregression.Theideaofrobustregressionistoweightheobservationsdifferentlybasedonhowwellbehavedtheseobservationsare.Roughlyspeaking,itisaformofweightedandreweightedleastsquaresregression. 

ProcrobustreginSAScommandimplementsseveralversionsofrobustregression.Inthispage,wewillshowM-estimationwithHuberandbisquareweighting.ThesetwoareverystandardandarecombinedasthedefaultweightingfunctioninStata'srobustregressioncommand. InHuberweighting,observationswithsmallresidualsgetaweightof1andthelargertheresidual,thesmallertheweight. Withbisquareweighting,allcaseswithanon-zeroresidualgetdown-weightedatleastalittle. 

Descriptionoftheexampledata

Forourdataanalysisbelow,wewillusethedatasetcrime. Thisdataset appearsinStatisticalMethodsforSocialSciences,ThirdEditionbyAlanAgrestiandBarbaraFinlay(PrenticeHall,1997). Thevariablesarestateid(sid),statename(state),violentcrimesper100,000people(crime),murdersper1,000,000(murder), thepercentofthepopulationlivinginmetropolitanareas(pctmetro),thepercentofthepopulationthatiswhite(pctwhite),percentofpopulationwithahighschooleducationorabove(pcths),percentofpopulationlivingunderpovertyline(poverty),andpercentofpopulationthataresingleparents(single). Ithas51observations.Wearegoingtousepovertyandsingletopredictcrime.

datacrime;

infile"crime.csv"delimiter=","firstobs=2;

inputsidstate$crimemurderpctmetropctwhitepcthspovertysingle;

run;

procmeansdata=crime;

varcrimepovertysingle;

run;

TheMEANSProcedure

VariableNMeanStdDevMinimumMaximum

------------------------------------------------------------------------------

crime51612.8431373441.100322982.00000002922.00

poverty5114.25882354.58424158.000000026.4000000

single5111.32549022.12149418.400000022.1000000

------------------------------------------------------------------------------

Usingrobustregressionanalysis

Inmostcases,webeginbyrunninganOLSregressionanddoingsomediagnostics. WewillbeginbyrunninganOLSregression. Wecreateagraphshowingtheleverageversusthesquaredresiduals,labelingthepointswiththestateabbreviations.Todoso,weoutputtheresidualsandleverageinprocreg(alongwithCook's-D,whichwewilluselater). 

procregdata=crime;

modelcrime=povertysingle;

outputout=tstudent=rescookd=cookdh=lev;

run;

quit;

datat;sett;

resid_sq=res*res;

run;

procsgplotdata=t;

scattery=levx=resid_sq/datalabel=state;

run;

quit;

Aswecansee,DC,FloridaandMississippihaveeitherhighleverageorlargeresiduals. WecandisplaytheobservationsthathaverelativelylargevaluesofCook'sD.Aconventionalcut-offpointis4/n,wherenisthenumberofobservationsinthedataset.Wewillusethiscriteriontoselectthevaluestodisplay.

procprintdata=t;

wherecookd>4/51;

varstatecrimepovertysinglecookd;

run;

Obsstatecrimepovertysinglecookd

1ak7619.114.30.12547

9fl120617.810.60.14259

25ms43424.714.70.61387

51dc292226.422.12.63625

WeprobablyshoulddropDCtobeginwithsinceitisnotevenastate.WeincludeitintheanalysisjusttoshowthatithaslargeCook'sDanddemonstratehowitwillbehandledbyprocrobustreg.Nowwewilllookattheresiduals.Wewillgenerateanewvariablecalledabsr1,whichistheabsolutevalueoftheresiduals(becausethesignoftheresidualdoesn'tmatter).Wethenprintthetenobservationswiththehighestabsoluteresidualvalues.

datat2;sett;

rabs=abs(res);

run;

procsortdata=t2;

bydescendingrabs;

run;

procprintdata=t2(obs=10);

run;

ppr

ccpe

mttoss

scumwpvici

trrehcenodr

Osaidtitrgrol_a

bitmerthtlekesb

sdeeroesyesdvqs

125ms43413.530.763.364.324.714.7-3.562990.613870.1266912.69493.56299

29fl12068.993.083.574.417.810.62.902660.142590.048328.42552.90266

351dc292278.5100.031.873.126.422.12.616452.636250.536026.84582.61645

446vt1143.627.098.480.810.011.0-1.742410.042720.040503.03601.74241

526mt1783.024.092.681.014.910.8-1.460880.016760.023012.13421.46088

621me1261.635.798.578.810.710.6-1.426740.022330.031862.03561.42674

71ak7619.041.875.286.69.114.3-1.397420.125470.161611.95281.39742

831nj6275.3100.080.876.710.99.61.354150.022290.035191.83371.35415

914il96011.484.081.076.213.611.51.338190.012660.020761.79081.33819

1020md99812.792.868.978.49.712.01.287090.035700.060721.65661.28709

Nowlet'srunourfirstrobustregression.Robustregressionisdonebyiteratedre-weightedleastsquares.Theprocedureforrunningrobustregressionisprocrobustreg.ThereareacoupleofestimatorsforIWLS.WearegoingtofirstusetheHuberweightsinthisexample.WecansavethefinalweightscreatedbytheIWLSprocess. Thiscanbeveryuseful.We willusethedatasett2generatedabove.

procrobustregdata=t2method=m(wf=huber);

modelcrime=povertysingle;

outputout=t3weight=wgt;

run;

ModelInformation

DataSetWORK.T2

DependentVariablecrime

NumberofIndependentVariables2

NumberofObservations51

MethodMEstimation

NumberofObservationsRead51

NumberofObservationsUsed51

SummaryStatistics

Standard

VariableQ1MedianQ3MeanDeviationMAD

poverty10.700013.100017.400014.25884.58424.2995

single10.000010.900012.100011.32552.12151.4826

crime326.0515.0780.0612.8441.1345.4

ParameterEstimates

Standard95%ConfidenceChi-

ParameterDFEstimateErrorLimitsSquarePr>ChiSq

Intercept1-1423.23167.5099-1751.54-1094.9172.19<.0001

poverty18.86948.0429-6.894424.63311.220.2701

single1169.001217.3795134.9381203.064494.56<.0001

Scale1181.7251

DiagnosticsSummary

Observation

TypeProportionCutoff

Outlier0.03923.0000

Goodness-of-Fit

StatisticValue

R-Square0.5257

AICR73.1089

BICR78.9100

Deviance2216391

procsortdata=t3;

bywgt;

run;

procprintdata=t3(obs=15);

varstatecrimepovertysinglerescookdlevwgt;

run;

Obsstatecrimepovertysinglerescookdlevwgt

1ms43424.714.7-3.562990.613870.126690.28886

2fl120617.810.62.902660.142590.048320.35947

3vt11410.011.0-1.742410.042720.040500.59545

4dc292226.422.12.616452.636250.536020.64980

5mt17814.910.8-1.460880.016760.023010.68630

6me12610.710.6-1.426740.022330.031860.72509

7nj62710.99.61.354150.022290.035190.73812

8il96013.611.51.338190.012660.020760.76600

9ak7619.114.3-1.397420.125470.161610.78039

10md9989.712.01.287090.035700.060720.79570

11ma80510.710.91.198540.016400.033110.83933

12la106226.414.9-1.021830.067000.161430.91528

13ca107818.212.51.015210.012310.034581.00000

14wy28613.310.8-0.966260.006670.020991.00000

15sc102318.712.30.912130.011110.038531.00000

Wecanseethatroughly,astheabsoluteresidualgoesdown,theweightgoesup. Inotherwords,caseswithalargeresidualstendtobedown-weighted.WecanalsoseethatthevaluesofCook'sDdon'treallycorrespondtotheweights. Thisoutputshowsusthattheobser

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 求职职场 > 职业规划

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1