中科院机器学习题库newWord格式.docx

资源描述

中科院机器学习题库newWord格式.docx

《中科院机器学习题库newWord格式.docx》由会员分享，可在线阅读，更多相关《中科院机器学习题库newWord格式.docx（47页珍藏版）》请在冰豆网上搜索。

中科院机器学习题库newWord格式.docx

，其中m为多选项的数目。

那么已知考生答对题目，求他知道正确答案的概率。

1、

Conjugatepriors

Thereadingsforthisweekincludediscussionofconjugatepriors.Givenalikelihood

foraclassmodelswithparametersθ,aconjugatepriorisadistribution

withhyperparametersγ,suchthattheposteriordistribution

与先验的分布族相同

（a）Supposethatthelikelihoodisgivenbytheexponentialdistributionwithrateparameterλ:

Showthatthegammadistribution

isaconjugatepriorfortheexponential.Derivetheparameterupdategivenobservations

andthepredictiondistribution

（b）Showthatthebetadistributionisaconjugatepriorforthegeometricdistribution

whichdescribesthenumberoftimeacoinistosseduntilthefirstheadsappears,whentheprobabilityofheadsoneachtossisθ.Derivetheparameterupdateruleandpredictiondistribution.

（c）Suppose

isaconjugatepriorforthelikelihood

;

showthatthemixtureprior

isalsoconjugateforthesamelikelihood,assumingthemixtureweightswmsumto1.

（d）Repeatpart（c）forthecasewherethepriorisasingledistributionandthelikelihoodisamixture,andthepriorisconjugateforeachmixturecomponentofthelikelihood.

somepriorscanbeconjugateforseveraldifferentlikelihoods;

forexample,thebetaisconjugatefortheBernoulli

andthegeometricdistributionsandthegammaisconjugatefortheexponentialandforthegammawithfixedα

（e）（Extracredit,20）Explorethecasewherethelikelihoodisamixturewithfixedcomponentsandunknownweights;

i.e.,theweightsaretheparameterstobelearned.

三、判断题

（1）给定n个数据点，如果其中一半用于训练，另一半用于测试，则训练误差和测试误差之间的差别会随着n的增加而减小。

（2）极大似然估计是无偏估计且在所有的无偏估计中方差最小，所以极大似然估计的风险最小。

（３）回归函数A和B，如果A比B更简单，则A几乎一定会比B在测试集上表现更好。

（４）全局线性回归需要利用全部样本点来预测新输入的对应输出值，而局部线性回归只需利用查询点附近的样本来预测输出值。

所以全局线性回归比局部线性回归计算代价更高。

（５）Boosting和Bagging都是组合多个分类器投票的方法，二者都是根据单个分类器的正确率决定其权重。

（６）Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorofthecombinedclassifiervaryroughlyinconcert（F）

Whilethetrainingerrorofthecombinedclassifiertypicallydecreasesasafunctionofboostingiterations,theerroroftheindividualdecisionstumpstypicallyincreasessincetheexampleweightsbecomeconcentratedatthemostdifficultexamples.

（７）OneadvantageofBoostingisthatitdoesnotoverfit.（F）

（８）Supportvectormachinesareresistanttooutliers,i.e.,verynoisyexamplesdrawnfromadifferentdistribution.（Ｆ）

（9）在回归分析中，最佳子集选择可以做特征选择，当特征数目较多时计算量大；

岭回归和Lasso模型计算量小，且Lasso也可以实现特征选择。

（10）当训练数据较少时更容易发生过拟合。

（11）梯度下降有时会陷于局部极小值，但EM算法不会。

（12）在核回归中，最影响回归的过拟合性和欠拟合之间平衡的参数为核函数的宽度。

（13）IntheAdaBoostalgorithm,theweightsonallthemisclassifiedpointswillgoupbythesamemultiplicativefactor.（T）

（14）True/False:

Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltycannotdecreasetheL2errorofthesolutionwˆonthetrainingdata.（F）

（15）True/False:

Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltyalwaysdecreasestheexpectedL2errorofthesolutionwˆonunseentestdata（F）.

（16）除了EM算法，梯度下降也可求混合高斯模型的参数。

（T）

（20）Anydecisionboundarythatwegetfromagenerativemodelwithclass-conditionalGaussiandistributionscouldinprinciplebereproducedwithanSVMandapolynomialkernel.

True!

Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecisionboundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanorequaltotwo.

（21）AdaBoostwilleventuallyreachzerotrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeencombined.

False!

Ifthedataisnotseparablebyalinearcombinationoftheweakclassifiers,AdaBoostcan’tachievezerotrainingerror.

（22）TheL2penaltyinaridgeregressionisequivalenttoaLaplacepriorontheweights.（F）

（23）Thelog-likelihoodofthedatawillalwaysincreasethroughsuccessiveiterationsoftheexpectationmaximationalgorithm.（F）

（24）Intrainingalogisticregressionmodelbymaximizingthelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions.（F）

一、回归

1、考虑回归一个正则化回归问题。

在下图中给出了惩罚函数为二次正则函数，当正则化参数C取不同值时，在训练集和测试集上的log似然（meanlog-probability）。

（10分）

（1）说法“随着C的增加，图2中训练集上的log似然永远不会增加”是否正确，并说明理由。

（2）解释当C取较大值时，图2中测试集上的log似然下降的原因。

2、考虑线性回归模型：

，训练数据如下图所示。

（1）用极大似然估计参数，并在图（a）中画出模型。

（3分）

（2）用正则化的极大似然估计参数，即在log似然目标函数中加入正则惩罚函数

，

并在图（b）中画出当参数C取很大值时的模型。

（3）在正则化后，高斯分布的方差

是变大了、变小了还是不变？

（4分）

图（a）图（b）

3.考虑二维输入空间点

上的回归问题，其中

在单位正方形。

训练样本和测试样本在单位正方形中均匀分布，输出模型为

，我们用1-10阶多项式特征，采用线性回归模型来学习x与y之间的关系（高阶特征模型包含所有低阶特征），损失函数取平方误差损失。

（1）现在

个样本上，训练1阶、2阶、8阶和10阶特征的模型，然后在一个大规模的独立的测试集上测试，则在下3列中选择合适的模型（可能有多个选项），并解释第3列中你选择的模型为什么测试误差小。

训练误差最小

训练误差最大

测试误差最小

1阶特征的线性模型

2阶特征的线性模型

8阶特征的线性模型

10阶特征的线性模型

（2）现在

（3）Theapproximationerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.（T）

（4）Thestructuralerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.（F）

4、Wearetryingtolearnregressionparametersforadatasetwhichweknowwasgeneratedfromapolynomialofacertaindegree,butwedonotknowwhatthisdegreeis.Assumethedatawasactuallygeneratedfromapolynomialofdegree5withsomeaddedGaussiannoise（thatis

Fortrainingwehave100{x,y}pairsandfortestingweareusinganadditionalsetof100{x,y}pairs.Sincewedonotknowthedegreeofthepolynomialwelearntwomodelsfromthedata.ModelAlearnsparametersforapolynomialofdegree4andmodelBlearnsparametersforapolynomialofdegree6.Whichofthesetwomodelsislikelytofitthetestdatabetter?

Answer:

Degree6polynomial.Sincethemodelisadegree5polynomialandwehaveenoughtrainingdata,themodelwelearnforasixdegreepolynomialwilllikelyfitaverysmallcoefficientforx6.Thus,eventhoughitisasixdegreepolynomialitwillactuallybehaveinaverysimilarwaytoafifthdegreepolynomialwhichisthecorrectmodelleadingtobetterfittothedata.

5、Input-dependentnoiseinregression

Ordinaryleast-squaresregressionisequivalenttoassumingthateachdatapointisgeneratedaccordingtoalinearfunctionoftheinputpluszero-mean,constant-varianceGaussiannoise.Inmanysystems,however,thenoisevarianceisitselfapositivelinearfunctionoftheinput（whichisassumedtobenon-negative,i.e.,x>

=0）.

a）Whichofthefollowingfamiliesofprobabilitymodelscorrectlydescribesthissituationintheunivariatecase?

（Hint:

onlyoneofthemdoes.）

（iii）iscorrect.InaGaussiandistributionovery,thevarianceisdeterminedbythecoefficientofy2;

sobyreplacing

wegetavariancethatincreaseslinearlywithx.（Notealsothechangetothenormalization“constant.”）（i）hasquadraticdependenceonx;

（ii）doesnotchangethevarianceatall,itjustrenamesw1.

b）CircletheplotsinFigure1thatcouldplausiblyhavebeengeneratedbysomeinstanceofthemodelfamily（ies）youchose.

（ii）and（iii）.（Notethat（iii）worksfor

.）（i）exhibitsalargevarianceatx=0,andthevarianceappearsindependentofx.

c）True/False:

Regressionwithinput-dependentnoisegivesthesamesolutionasordinaryregressionforaninfinitedatasetgeneratedaccordingtothecorrespondingmodel.

True.Inbothcasesthealgorithmwillrecoverthetrueunderlyingmodel.

d）Forthemodelyouchoseinpart（a）,writedownthederivativeofthenegativeloglikelihoodwithrespecttow1.

二、分类

1.产生式模型vs.判别式模型

（a）[points]Yourbillionairefriendneedsyourhelp.Sheneedstoclassifyjobapplicationsintogood/badcategories,andalsotodetectjobapplicantswholieintheirapplicationsusingdensityestimationtodetectoutliers.Tomeettheseneeds,doyourecommendusingadiscriminativeorgenerativeclassifier?

Why?

[final_sol_s07]

产生式模型

因为要估计密度

（b）[points]Yourbillionairefriendalsowantstoclassifysoftwareapplicationstodetectbug-proneapplicationsusingfeaturesofthesourcecode.Thispilotprojectonlyhasafewapplicationstobeusedastrainingdata,though.Tocreatethemostaccurateclassifier,doyourecommendusingadiscriminativeorgenerativeclassifier?

判别式模型

样本数较少，通常用判别式模型直接分类效果会好些

（d）[points]Finally,yourbillionairefriendalsowantstoclassifycompaniestodecidewhichonetoacquire.Thisprojecthaslotsoftrainingdatabasedonseveraldecadesofresearch.Tocreatethemostaccurateclassifier,doyourecommendusingadiscriminativeorgenerativeclassifier?

样本数很多时，可以学习到正确的产生式模型

2、logstic回归

Figure2:

Log-probabilityoflabelsasafunctionofregularizationparameterC

Hereweusealogisticregressionmodeltosolveaclassificationproblem.InFigure2,wehaveplottedthemeanlog-probabilityoflabelsinthetrainingandtestsetsafterhavingtrainedtheclassifierwithquad

展开阅读全文