1、 CS 229 机器学习 (问题及答案)斯坦福大学 目录(1)作业1(Supervised Learning)1 (2)作业1解答(Supervised Learning)5 (3)作业2(Kernels,SVMs,and Theory)15 (4)作业2解答(Kernels,SVMs,and Theory)19 (5)作业3(Learning Theory and Unsupervised Learning)27 (6)作业3解答(Learning Theory and Unsupervised Learning)31 (7)作业4(Unsupervised Learning and Rei

2、nforcement Learning)39 (8)作业4解答(Unsupervised Learning and Reinforcement Learning)44 (9)Problem Set#1:Supervised Learning 56 (10)Problem Set#1 Answer 62 (11)Problem Set#2:Problem Set#2:Naive Bayes,SVMs,and Theory 78 (12)Problem Set#2 Answer 85 CS229 Problem Set#11CS 229,Public CourseProblem Set#1:Sup

3、ervised Learning1.Newtons method for computing least squaresIn this problem,we will prove that if we use Newtons method solve the least squaresoptimization problem,then we only need one iteration to converge to.(a)Find the Hessian of the cost function J()=12Pmi=1(Tx(i)y(i)2.(b)Show that the first it

4、eration of Newtons method gives us=(XTX)1XT y,thesolution to our least squares problem.2.Locally-weighted logistic regressionIn this problem you will implement a locally-weighted version of logistic regression,wherewe weight different training examples differently according to the query point.The lo

5、cally-weighted logistic regression problem is to maximize()=2T+mXi=1w(i)hy(i)logh(x(i)+(1 y(i)log(1 h(x(i)i.The 2T here is what is known as a regularization parameter,which will be discussedin a future lecture,but which we include here because it is needed for Newtons method toperform well on this t

6、ask.For the entirety of this problem you can use the value =0.0001.Using this definition,the gradient of()is given by()=XTz where z Rmis defined byzi=w(i)(y(i)h(x(i)and the Hessian is given byH=XTDX Iwhere D Rmmis a diagonal matrix withDii=w(i)h(x(i)(1 h(x(i)For the sake of this problem you can just

7、 use the above formulas,but you should try toderive these results for yourself as well.Given a query point x,we choose compute the weightsw(i)=exp?|x x(i)|222?.Much like the locally weighted linear regression that was discussed in class,this weightingscheme gives more when the“nearby”points when pre

8、dicting the class of a new example.1CS229 Problem Set#12(a)Implement the Newton-Raphson algorithm for optimizing()for a new query pointx,and use this to predict the class of x.The q2/directory contains data and code for this problem.You should implementthe y=lwlr(X train,y train,x,tau)function in th

9、e lwlr.m file.This func-tion takes as input the training set(the X train and y train matrices,in the formdescribed in the class notes),a new query point x and the weight bandwitdh tau.Given this input the function should 1)compute weights w(i)for each training exam-ple,using the formula above,2)maxi

10、mize()using Newtons method,and finally 3)output y=1h(x)0.5 as the prediction.We provide two additional functions that might help.The X train,y train=load data;function will load the matrices from files in the data/folder.The func-tion plot lwlr(X train,y train,tau,resolution)will plot the resulting

11、clas-sifier(assuming you have properly implemented lwlr.m).This function evaluates thelocally weighted logistic regression classifier over a large grid of points and plots theresulting prediction as blue(predicting y=0)or red(predicting y=1).Dependingon how fast your lwlr function is,creating the pl

12、ot might take some time,so werecommend debugging your code with resolution=50;and later increase it to atleast 200 to get a better idea of the decision boundary.(b)Evaluate the system with a variety of different bandwidth parameters.In particular,try =0.01,0.050.1,0.51.0,5.0.How does the classificat

13、ion boundary change whenvarying this parameter?Can you predict what the decision boundary of ordinary(unweighted)logistic regression would look like?3.Multivariate least squaresSo far in class,we have only considered cases where our target variable y is a scalar value.Suppose that instead of trying

14、to predict a single output,we have a training set withmultiple outputs for each example:(x(i),y(i),i=1,.,m,x(i)Rn,y(i)Rp.Thus for each training example,y(i)is vector-valued,with p entries.We wish to use a linearmodel to predict the outputs,as in least squares,by specifying the parameter matrix iny=T

15、x,where Rnp.(a)The cost function for this case isJ()=12mXi=1pXj=1?(Tx(i)j y(i)j?2.Write J()in matrix-vector notation(i.e.,without using any summations).Hint:Start with the m n design matrixX=(x(1)T(x(2)T.(x(m)T2CS229 Problem Set#13and the m p target matrixY=(y(1)T(y(2)T.(y(m)Tand then work out how t

16、o express J()in terms of these matrices.(b)Find the closed form solution for which minimizes J().This is the equivalent tothe normal equations for the multivariate case.(c)Suppose instead of considering the multivariate vectors y(i)all at once,we insteadcompute each variable y(i)jseparately for each j=1,.,p.In this case,we have a pindividual linear models,of the formy(i)j=Tjx(i),j=1,.,p.(So here,each j Rn).How do the parameters from these p independent leastsquares problems compare to the multiv

