北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx

上传人:b****8 文档编号:9561548 上传时间:2023-02-05 格式:DOCX 页数:19 大小:74.35KB
下载 相关 举报
北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx_第1页
第1页 / 共19页
北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx_第2页
第2页 / 共19页
北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx_第3页
第3页 / 共19页
北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx_第4页
第4页 / 共19页
北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx_第5页
第5页 / 共19页
点击查看更多>>
下载资源
资源描述

北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx

《北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx》由会员分享,可在线阅读,更多相关《北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx(19页珍藏版)》请在冰豆网上搜索。

北大暑期课程《回归分析报告》Linear Regression Analysis讲义PKU8.docx

北大暑期课程《回归分析报告》LinearRegressionAnalysis讲义PKU8

Class8:

polynomialregressionanddummyvariables

I.PolynomialRegression

Polynomialregressionisaminortopic.Becausethereislittlethatisnew.Whatisnewisthatyoumaywanttocreateanewvariablefromthesamedataset.

Thisisnecessaryifyouthinkthatthetrueregressionfunctionisnotlinearbutquadratic,youmightwanttotrytousethequadraticfunction,thatis,thefirstandthesecondorderregressors.

Forexample,weknowthatearningsincreasesasafunctionofage.Buttherelationshipisnotlinear.Therefore,weregressearningsonageandage2.Oneimportanttrickisthatifyouhavepolynomialregression,theregressionlineisnolongerlinearwhenyouplotthedependentvariableagainstindependentvariable.

Useahypotheticalexample,

Ifweobtain

hasalineareffect,

hasaquadraticeffect.Ifyouareaskedtoplot

against

thelineislinear.Ifyouareaskedtoplot

against

thelineisquadratic.Saythesamplemeanof

is0.5

0

1

2

3

5

8

Interpretationofcoefficientsinquadraticequations

Say

important:

thereisnosimplerelationshipbetween

and

.Sometime,theeffectof

on

ispositive,sometimestheeffectof

on

isnegative.Inotherwords,theeffectof

on

dependsonthevalueof

.

Suggestion:

plottheregressionforthedatarange.

Onethingwecantell:

When2>0,theeffectof

on

increaseswith

;

When2<0,theeffectof

on

decreaseswith

.

[figure]

II.InterpretationofCoefficientsinPolynomialRegression

RelationshipbetweenYandthe“polynomial”independentvariableisnolongerlinear.

Recallaspecialpropertyofthelinearfunction:

therelationshipbetweenYandanX(sayXk)isconstantforallvaluesofthisXandotherXvariables:

(1)

.

Inapolynomialregression,thissimplerelationshipnolongerholdstrue.Foraquadraticregression,forexample,

(2)

 

wehave

(3)

whichisdependentonthevalueofXk.

Ingeneral,thesituationwhereasimplelinearrelationshipofequation

(1)isnottrueiscalled“interaction”,atopictowhichwewilldevotealecture.Fornow,letusdefineinteractionasthesituationwherethe“effect”ofanindependentvariabledependsonthevalueofanothervariable.

Inpolynomialregressionsinvolvinganindependentvariableofanorderhigherthan1(i.e.,quadraticorhigher),wecaninterpretthisasanimplicitinteractionofavariablewithitself.

Example,earningsasafunctionofexperience.Ifthequadraticfunctionistrue,wecanfindavalueofexperiencewhichmaximizesearnings(whichcouldbeeitherwithinareasonablerangeexperiencedbyworkersorinarangeunlikelytobeexperiencedbyworkers).Useequation(3)toobtaintheyearthatmaximizesearnings:

Thatiswhywewouldwanttosee

and

tobeofdifferentsigns.

InXieandHannum(1996Table1,Model2),

=0.046,

=-0.000693.Optimalyearofexperienceis:

33.2years,aboutretirementage.InU.S.,itis33.8years.SeeXieandHannum(1996,p.955).

Notethatbeforethiscriticalvalue,the“effect”ofXkonYisalwayspositive,buttherateoftheincreasedeclines,upto33years.

III.DefiningDummyVariables

Adummyvariableissometimescalledan"indicatorvariable."

Itreferstothefollowinglogicalcodingschemeforadichotomousvariable:

x=1ifaparticulareventistrue

x=0otherwise.

A.ExamplesofDummyVariables

Sex(Male):

x1=0iffemale

x1=1ifmale

EmploymentStatus:

x2=0ifnotemployed

x2=1ifemployed

Povertystatus:

x3=0ifnotinpoverty,orhouseholdincome>threshold.

x3=1ifinpoverty,orhouseholdincome

B.InterpretationofDummyCoefficients(Intercept?

1.Whenadummyvariableistheonlyindependentvariable

Interpretation:

interceptisgroup-specific.

Example:

y=Income,

x=1ifmales

regressyonx1:

y=β0+β1x1

Aswediscussedbefore,regressionshouldbeinterpretedasconditionalmeans.Rememberinyourexercise,ifwehave1astheonlyregressor,theestimatedinterceptisidenticaltothesamplemean.Ifwehaveadummyvariableinaregression,theestimatedcoefficientrepresentsthemeandifferencebetweentwogroups.

Incomeleveloffemales:

β0

Incomelevelofmales:

β0+β1

β1isthemeandifferenceinincomebetweenmalesandfemales.

Ifwecomputethemeansbysex,wegetthesameresults.

Proof:

letusregroupthesamplebysex:

n=n1+n2:

dividethesampleintotwosamples:

malesandfemales.

First,regroupthedataintofemales(x1=0)andmales(x1=1).Noten1+n2=n.

(overn2meaningsummationfromn1+1ton1+n2,alsodenotedby

=

Howaboutthestandarderrors?

Theycanbedifferent(pooledversusgroup-specificestimatorof).

2.Whenadummyvariableisusedwithothercontinuousindependentvariables

Twoparallellineswithdifferentintercept.Thereexistsanoveralldifferencealongtheentiredistributionrangeofthecontinuousvariables.[blackboard]drawlines.

Assumption:

thereisnointeraction.

Example:

incomeonsexandability.

IV.Whenadummyvariableisusedwithanotherdummyvariable

Fourparallellineswithdifferentintercepts.

Assumption:

nointeractioneitherbetweenthedummyvariablesandthecontinuousvariablesorbetweenthetwodummyvariables.

V.ImportantDifference:

DichotomousVariablesusedasindependentvariablesandasdependentvariables

Independentvariable:

theeffectisashift.

Dependentvariable:

thelinearmodelcannotbetrue.

[blackboard]why?

VI.TheLeastSquaresEstimationwithacontinuousvariableandadummyvariable

Theleastsquaresestimationholdsupforregressionswithdummyvariables.

X=|1,x1x2|,wherex1isacontinuousvariable,x2isadummyvariable.

X'X:

=|nxi1xi2|

|xi12xi1xi2|

|xi2|

=|nx1in2|

|x1i2x1iovern2|

|n2|

n2isthetotalnumberofcaseswherex2=1istrue.

Allalgorithmsfortheleastsquaresestimationstillhold.

Interpretationof1:

thepureeffectofx1netofoverallgroupdifference.Alsocalled“within-groupaverageeffectofx1”.

1canbeestimatedin3-steppartialregressionmethod:

(1)Regressyonx2,obtainresiduals==y*(whichisthedeviationofyfromthegroupmean);

(2)Regressx1onx2,obtainresiduals==x1*(whichisthedeviationofx1fromthegroupmean);

(3)Thenregressy*onx1*,weobtain1,whichisthepure,partialeffectofx1ony.Remembertoadjustfordegreesoffreedom(by1)duetox2.

 

VI.NominalVariables

Definition:

Anominalvariableisaclassificationsystem.Noinformationaboutorderingisassumedorutilized.Numericalvaluesforanominalvariablearearbitrary,usedforclassificationoridentification.

ForanominalindependentvariablewithJcategories,weuseasetofJ-1dummyvariablesinregressionanalysis.

Sayavariablexhasthreecategories,weneedtouse2dummyvariables(inadditiontotheintercept):

x1=1ifx=2

x1=0otherwise

x2=1ifx=3

x2=0otherwise

Forexample,forvariableRace:

Race

(2)=1ifBlack

Race(3)=1ifAsian

Inthiscase,Whiteistheexcludedcategory.

Alternatively,

Race(Black)=1ifBlack

Race(Asian)=1ifAsian

Dummyvariablesforanominalvariableshouldappeartogetherinthemodel(ininteractions,forexample).Theycannothaveinteractionswitheachotherbecausetheydonotoverlap.

Regression:

yisincome

y=0+1Race(black)+2Race(Asian)+

say,b'=|20,-10,-15|

MeanofWhites:

0=20

MeanofBlacks:

0+1=10

MeanofAsians:

0+2=5

IfwechangethecodingsothatBlackisusedastheexcludedcategory:

y=0+1Race(white)+2Race(Asian)+

0=10=black

1=White-black=10

2=Asian-black=-5

Interpretationofcoefficientsinacomplexmodel:

Ifwehavetwosetsofdummyvariables

y=0+1Race(white)+2Race(Asian)+3Sex(male)+

Whatis0?

Meanincomeleveloffemaleblacks

ReasonisthatexcludedcategoriesareblacksforRaceandfemalesforSex.

Howdowecomputeaveragesforothergroups:

AsianFemale?

Whitemale?

Ifwehavetwodummyvariablesandonecontinuousvariable

y=0+1Race(white)+2Race(Asian)+3Sex(male)+4Ability+

0istheincomeleveloffemaleblackswithzeroscoreofability.Itisanintercept.

[blackboard]Sixparallellines.TwodummyvariablesforRacecannotoverlap.RaceandSexdooverlap.Additivityisassumedhere.WewilldiscussinteractionsonThursday.

VII.TestingforCollapsibilityofCategories.

WecanuseF-testsfornestedmodelstotestthecollapsibilityofcategoriesinanominalvariable.

ConsideraGSSquestionaboutregionofresidenceatage16(REG16):

OriginalCode

Recode

1

NewEngland

East

2

MiddleAtlantic

3

EastNorthCentral

Midwest

4

WestNorthCentral

5

SouthAtlantic

South

6

EastSouthCentral

7

WestSouthCent

8

Mountain

West

9

Pacific

Inregressionanalysis(saywithoccupationalprestigeasthedependentvariable),wecanuseasetof8dummyvariablesfortheoriginalcodesofthevariable.Wecanalsouseasetof3dummyvariablesafterwecollapsethecodesintoasmallersetof4broaderregions.

Thetwomodelsarenested.Seeexample

TheF-testbetweenthetwonestedmodelstellsuswhetherthecollapsingisjustified.

F(5,550)=[(127029.555-125261.776)/5]/227.75

=[1767.779/5]/227.75=353.56/227.75=1.55,notsignificantat5%.

.recodereg16x(2=1)(4=3)(6=5)(7=5)(9=8)(reg16x:

302changesmade)

.

.tablereg16,c(meanprestige)

--------------------------reg16|mean(prestige)

----------+---------------

1|39.59259

2|44.01123

3

|

42.0087

4

|

44.5161

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 自然科学 > 天文地理

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1