中科院机器学习试题库new.docx
《中科院机器学习试题库new.docx》由会员分享,可在线阅读,更多相关《中科院机器学习试题库new.docx(24页珍藏版)》请在冰豆网上搜索。
中科院机器学习试题库new
机器学习题库
一、极大似然
1、MLestimationofe*ponentialmodel(10)
AGaussiandistributionisoftenusedtomodeldataontherealline,butissometimesinappropriatewhenthedataareoftenclosetozerobutconstrainedtobenonnegative.Insuchcasesonecanfitane*ponentialdistribution,whoseprobabilitydensityfunctionisgivenby
GivenNobservations*idrawnfromsuchadistribution:
(a)Writedownthelikelihoodasafunctionofthescaleparameterb.
(b)Writedownthederivativeoftheloglikelihood.
(c)Giveasimplee*pressionfortheMLestimateforb.
2、换成Poisson分布:
3、
二、贝叶斯
假设在考试的多项选择中,考生知道正确答案的概率为p,猜想答案的概率为1-p,并且假设考生知道正确答案答对题的概率为1,猜中正确答案的概率为
,其中m为多项选择项的数目。
则考生答对题目,求他知道正确答案的概率。
1、
Conjugatepriors
Thereadingsforthisweekincludediscussionofconjugatepriors.Givenalikelihood
foraclassmodelswithparametersθ,aconjugatepriorisadistribution
withhyperparametersγ,suchthattheposteriordistribution
与先验的分布族一样
(a)Supposethatthelikelihoodisgivenbythee*ponentialdistributionwithrateparameterλ:
Showthatthegammadistribution
_
isaconjugatepriorforthee*ponential.Derivetheparameterupdategivenobservations
andthepredictiondistribution
.
(b)Showthatthebetadistributionisaconjugatepriorforthegeometricdistribution
whichdescribesthenumberoftimeacoinistosseduntilthefirstheadsappears,whentheprobabilityofheadsoneachtossisθ.Derivetheparameterupdateruleandpredictiondistribution.
(c)Suppose
isaconjugatepriorforthelikelihood
;showthatthemi*tureprior
isalsoconjugateforthesamelikelihood,assumingthemi*tureweightswmsumto1.
(d)Repeatpart(c)forthecasewherethepriorisasingledistributionandthelikelihoodisami*ture,andthepriorisconjugateforeachmi*tureponentofthelikelihood.
somepriorscanbeconjugateforseveraldifferentlikelihoods;fore*ample,thebetaisconjugatefortheBernoulli
andthegeometricdistributionsandthegammaisconjugateforthee*ponentialandforthegammawithfi*edα
(e)(E*tracredit,20)E*plorethecasewherethelikelihoodisami*turewithfi*edponentsandunknownweights;i.e.,theweightsaretheparameterstobelearned.
三、判断题
〔1〕给定n个数据点,如果其中一半用于训练,另一半用于测试,则训练误差和测试误差之间的差异会随着n的增加而减小。
〔2〕极大似然估计是无偏估计且在所有的无偏估计中方差最小,所以极大似然估计的风险最小。
〔3〕回归函数A和B,如果A比B更简单,则A几乎一定会比B在测试集上表现更好。
〔4〕全局线性回归需要利用全部样本点来预测新输入的对应输出值,而局部线性回归只需利用查询点附近的样本来预测输出值。
所以全局线性回归比局部线性回归计算代价更高。
〔5〕Boosting和Bagging都是组合多个分类器投票的方法,二者都是根据单个分类器的正确率决定其权重。
(6)Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorofthebinedclassifiervaryroughlyinconcert〔F〕
Whilethetrainingerrorofthebinedclassifiertypicallydecreasesasafunctionofboostingiterations,theerroroftheindividualdecisionstumpstypicallyincreasessincethee*ampleweightsbeeconcentratedatthemostdifficulte*amples.
(7)OneadvantageofBoostingisthatitdoesnotoverfit.〔F〕
(8)Supportvectormachinesareresistanttooutliers,i.e.,verynoisye*amplesdrawnfromadifferentdistribution.〔F〕
〔9〕在回归分析中,最正确子集选择可以做特征选择,当特征数目较多时计算量大;岭回归和Lasso模型计算量小,且Lasso也可以实现特征选择。
〔10〕当训练数据较少时更容易发生过拟合。
〔11〕梯度下降有时会陷于局部极小值,但EM算法不会。
〔12〕在核回归中,最影响回归的过拟合性和欠拟合之间平衡的参数为核函数的宽度。
(13)IntheAdaBoostalgorithm,theweightsonallthemisclassifiedpointswillgoupbythesamemultiplicativefactor.〔T〕
(14)True/False:
Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltycannotdecreasetheL2errorofthesolutionwˆonthetrainingdata.〔F〕
(15)True/False:
Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltyalwaysdecreasesthee*pectedL2errorofthesolutionwˆonunseentestdata〔F〕.
(16)除了EM算法,梯度下降也可求混合高斯模型的参数。
(T)
(20)Anydecisionboundarythatwegetfromagenerativemodelwithclass-conditionalGaussiandistributionscouldinprinciplebereproducedwithanSVMandapolynomialkernel.
True!
Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecisionboundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanorequaltotwo.
(21)AdaBoostwilleventuallyreachzerotrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeenbined.
False!
Ifthedataisnotseparablebyalinearbinationoftheweakclassifiers,AdaBoostcan’tachievezerotrainingerror.
(22)TheL2penaltyinaridgeregressionisequivalenttoaLaplacepriorontheweights.〔F〕
(23)Thelog-likelihoodofthedatawillalwaysincreasethroughsuccessiveiterationsofthee*pectationma*imationalgorithm.(F)
(24)Intrainingalogisticregressionmodelbyma*imizingthelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions.(F)
一、回归
1、考虑回归一个正则化回归问题。
在下列图中给出了惩罚函数为二次正则函数,当正则化参数C取不同值时,在训练集和测试集上的log似然〔meanlog-probability〕。
〔10分〕
〔1〕说法"随着C的增加,图2中训练集上的log似然永远不会增加〞是否正确,并说明理由。
〔2〕解释当C取较大值时,图2中测试集上的log似然下降的原因。
2、考虑线性回归模型:
,训练数据如下列图所示。
〔10分〕
〔1〕用极大似然估计参数,并在图〔a〕中画出模型。
〔3分〕
〔2〕用正则化的极大似然估计参数,即在log似然目标函数中参加正则惩罚函数
,
并在图〔b〕中画出当参数C取很大值时的模型。
〔3分〕
〔3〕在正则化后,高斯分布的方差
是变大了、变小了还是不变?
〔4分〕
图(a)图(b)
3.考虑二维输入空间点
上的回归问题,其中
在单位正方形。
训练样本和测试样本在单位正方形中均匀分布,输出模型为
,我们用1-10阶多项式特征,采用线性回归模型来学习*与y之间的关系〔高阶特征模型包含所有低阶特征〕,损失函数取平方误差损失。
(1)现在
个样本上,训练1阶、2阶、8阶和10阶特征的模型,然后在一个大规模的独立的测试集上测试,则在下3列中选择适宜的模型〔可能有多个选项〕,并解释第3列中你选择的模型为什么测试误差小。
〔10分〕
训练误差最小
训练误差最大
测试误差最小
1阶特征的线性模型
*
2阶特征的线性模型
*
8阶特征的线性模型
*
10阶特征的线性模型
*
(2)现在
个样本上,训练1阶、2阶、8阶和10阶特征的模型,然后在一个大规模的独立的测试集上测试,则在下3列中选择适宜的模型〔可能有多个选项〕,并解释第3列中你选择的模型为什么测试误差小。
〔10分〕
训练误差最小
训练误差最大
测试误差最小
1阶特征的线性模型
*
2阶特征的线性模型
8阶特征的线性模型
*
*
10阶特征的线性模型
*
(3)Theappro*imationerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.(T)
(4)Thestructuralerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.(F)
4、Wearetryingtolearnregressionparametersforadatasetwhichweknowwasgeneratedfromapolynomialofacertaindegree,butwedonotknowwhatthisdegreeis.Assumethedatawasactuallygeneratedfromapolynomialofdegree5withsomeaddedGaussiannoise(thatis
.
Fortrainingwehave100{*,y}pairsandfortestingweareusinganadditionalsetof100{*,y}pairs.Sincewedonotknowthedegreeofthepolynomialwelearntwomodelsfromthedata.ModelAlearnsparametersforapolynomialofdegree4andmodelBlearnsparametersforapolynomialofdegree6.Whichofthesetwomodelsislikelytofitthetestdatabetter"
Answer:
Degree6polynomial.Sincethemodelisadegree5polynomialandwehaveenoughtrainingdata,themodelwelearnforasi*degreepolynomialwilllikelyfitaverysmallcoefficientfor*6.Thus,eventhoughitisasi*degreepolynomialitwillactuallybehaveinaverysimilarwaytoafifthdegreepolynomialwhichisthecorrectmodelleadingtobetterfittothedata.
5、Input-dependentnoiseinregression
Ordinaryleast-squaresregressionisequivalenttoassumingthateachdatapointisgeneratedaccordingtoalinearfunctionoftheinputpluszero-mean,constant-varianceGaussiannoise.Inmanysystems,however,thenoisevarianceisitselfapositivelinearfunctionoftheinput(whichisassumedtobenon-negative,i.e.,*>=0).
a)Whichofthefollowingfamiliesofprobabilitymodelscorrectlydescribesthissituationintheunivariatecase"(Hint:
onlyoneofthemdoes.)
(iii)iscorrect.InaGaussiandistributionovery,thevarianceisdeterminedbythecoefficientofy2;sobyreplacing
by
wegetavariancethatincreaseslinearlywith*.(Notealsothechangetothenormalization"constant.〞)(i)hasquadraticdependenceon*;(ii)doesnotchangethevarianceatall,itjustrenamesw1.
b)CircletheplotsinFigure1thatcouldplausiblyhavebeengeneratedbysomeinstanceofthemodelfamily(ies)youchose.
(ii)and(iii).(Notethat(iii)worksfor
.)(i)e*hibitsalargevarianceat*=0,andthevarianceappearsindependentof*.
c)True/False:
Regressionwithinput-dependentnoisegivesthesamesolutionasordinaryregressionforaninfinitedatasetgeneratedaccordingtothecorrespondingmodel.
True.Inbothcasesthealgorithmwillrecoverthetrueunderlyingmodel.
d)Forthemodelyouchoseinpart(a),writedownthederivativeofthenegativeloglikelihoodwithrespecttow1.
二、分类
1.产生式模型vs.判别式模型
(a)[points]Yourbillionairefriendneedsyourhelp.Sheneedstoclassifyjobapplicationsintogood/badcategories,andalsotodetectjobapplicantswholieintheirapplicationsusingdensityestimationtodetectoutliers.Tomeettheseneeds,doyouremendusingadiscriminativeorgenerativeclassifier"Why"[final_sol_s07]
产生式模型
因为要估计密度
(b)[points]Yourbillionairefriendalsowantstoclassifysoftwareapplicationstodetectbug-proneapplicationsusingfeaturesofthesourcecode.Thispilotprojectonlyhasafewapplicationstobeusedastrainingdata,though.Tocreatethemostaccurateclassifier,doyouremendusingadiscriminativeorgenerativeclassifier"Why"
判别式模型
样本数较少,通常用判别式模型直接分类效果会好些
(d)[points]Finally,yourbillionairefriendalsowantstoclassifypaniestodecidewhichonetoacquire.Thisprojecthaslotsoftrainingdatabasedonseveraldecadesofresearch.Tocreatethemostaccurateclassifier,doyouremendusingadiscriminativeorgenerativeclassifier"Why"
产生式模型
样本数很多时,可以学习到正确的产生式模型
2、logstic回归
Figure2:
Log-probabilityoflabelsasafunctionofregularizationparameterC
Hereweusealogisticregressionmodeltosolveaclassificationproblem.InFigure2,wehaveplottedthemeanlog-probabilityoflabelsinthetrainingandtestsetsafterhavingtrainedtheclassifierwithquadraticregularizationpenaltyanddifferentvaluesoftheregularizationparameterC.
(1)Intrainingalogisticregressionmodelbyma*imizingthelikelihoodofthelabelsgi