1、svmSeparable DataYou can use a support vector machine (SVM) when your data has exactly two classes. AnSVMclassifies data by finding the best hyperplane that separates all data points of one class from those of the other class. Thebesthyperplane for anSVMmeans the one with the largestmarginbetween th
2、e two classes. Margin means the maximal width of the slab parallel to the hyperplane that has no interior data points.Thesupport vectorsare the data points that are closest to the separating hyperplane; these points are on the boundary of the slab. The following figure illustrates these definitions,
3、 with + indicating data points of type 1, and indicating data points of type 1.Mathematical Formulation: Primal.This discussion follows Hastie, Tibshirani, and Friedman19and Christianini and Shawe-Taylor11.The data for training is a set of points (vectors)xialong with their categoriesyi. For some di
4、mensiond, thexiRd, and theyi=1. The equation of a hyperplane is +b= 0,wherewRd, is the inner (dot) product ofwandx, andbis real.The following problem defines thebestseparating hyperplane. Findwandbthat minimize |w| such that for all data points (xi,yi),yi( +b) 1.The support vectors are thexion the b
5、oundary, those for whichyi(+b)=1.For mathematical convenience, the problem is usually given as the equivalent problem of minimizing /2. This is a quadratic programming problem. The optimal solution(w,b)enables classification of a vectorzas follows:class(z)=sign(Ew,zF+b).Mathematical Formulation: Dua
6、l.It is computationally simpler to solve the dual quadratic programming problem. To obtain the dual, take positive Lagrange multipliersimultiplied by each constraint, and subtract from the objective function:LP=12Ew,wFii(yi(Ew,xiF+b)1),where you look for a stationary point ofLPoverwandb. Setting the
7、 gradient ofLPto 0, you getw0=iiyixi=iiyi.(16-1)Substituting intoLP, you get the dualLD:LD=ii12ijijyiyjExi,xjF,which you maximize overi0. In general, manyiare 0 at the maximum. The nonzeroiin the solution to the dual problem define the hyperplane, as seen inEquation16-1, which giveswas the sum ofiyi
8、xi. The data pointsxicorresponding to nonzeroiare thesupport vectors.The derivative ofLDwith respect to a nonzeroiis 0 at an optimum. This givesyi( +b) 1 = 0.In particular, this gives the value ofbat the solution, by taking anyiwith nonzeroi.The dual is a standard quadratic programming problem. For
9、example, the Optimization Toolboxquadprogsolver solves this type of problem.Nonseparable DataYour data might not allow for a separating hyperplane. In that case,SVMcan use asoft margin, meaning a hyperplane that separates many, but not all data points.There are two standard formulations of soft marg
10、ins. Both involve adding slack variablessiand a penalty parameterC. TheL1-norm problem is:minw,b,s(12Ew,wF+Cisi)such thatyi(Ew,xiF+b)si1si0.TheL1-norm refers to usingsias slack variables instead of their squares. The three solver optionsSMO,ISDA, andL1Qpoffitcsvmminimize theL1-norm problem. TheL2-no
11、rm problem is:minw,b,s(12Ew,wF+Cis2i)subject to the same constraints.In these formulations, you can see that increasingCplaces more weight on the slack variablessi, meaning the optimization attempts to make a stricter separation between classes. Equivalently, reducingCtowards 0 makes misclassificati
12、on less important.Mathematical Formulation: Dual.For easier calculations, consider theL1dual problem to this soft-margin formulation. Using Lagrange multipliersi, the function to minimize for theL1-norm problem is:LP=12Ew,wF+Cisiii(yi(Ew,xiF+b)(1si)iisi,where you look for a stationary point ofLPover
13、w,b, and positivesi. Setting the gradient ofLPto 0, you getbiiyiii,i,si=iiyixi=0=Ci0.These equations lead directly to the dual formulation:maxii12ijijyiyjExi,xjFsubject to the constraintsiyii=00iC.The final set of inequalities, 0iC, shows whyCis sometimes called abox constraint.Ckeeps the allowable
14、values of the Lagrange multipliersiin a box, a bounded region.The gradient equation forbgives the solutionbin terms of the set of nonzeroi, which correspond to the support vectors.You can write and solve the dual of theL2-norm problem in an analogous manner. For details, see Christianini and Shawe-T
15、aylor11, Chapter 6.fitcsvm Implementation.Both dual soft-margin problems are quadratic programming problems. Internally,fitcsvmhas several different algorithms for solving the problems. For one-class or binary classification, if you do not set a fraction of expected outliers in the data ( seeOutlier
16、Fraction), then the default solver is Sequential Minimal Optimization (SMO). SMO minimizes the one-norm problem by a series of two-point minimizations. During optimization, SMO respects the linear constraintiiyi=0,and explicitly includes the bias term in the model. SMO is relatively fast. For more d
17、etails on SMO, see13. For binary classification, if you set a fraction of expected outliers in the data, then the default solver is the Iterative Single Data Algorithm. Like SMO, ISDA solves the one-norm problem. Unlike SMO, ISDA minimizes by a series on one-point minimizations, does not respect the
18、 linear constraint, and does not explicitly include the bias term in the model. For more details on ISDA, see22. For one-class or binary classification, and if you have an Optimization Toolbox license, you can choose to usequadprogto solve the one-norm problem.quadproguses a good deal of memory, but
19、 solves quadratic programs to a high degree of precision. For more details, seeQuadratic Programming Definition.Nonlinear Transformation with KernelsSome binary classification problems do not have a simple hyperplane as a useful separating criterion. For those problems, there is a variant of the mat
20、hematical approach that retains nearly all the simplicity of anSVMseparating hyperplane.This approach uses these results from the theory of reproducing kernels: There is a class of functionsK(x,y) with the following property. There is a linear spaceSand a functionmappingxtoSsuch thatK(x,y) = .The do
21、t product takes place in the spaceS. This class of functions includes:o Polynomials: For some positive integerd,K(x,y) = (1 + )d.o Radial basis function (Gaussian): For some positive number,K(x,y) = exp(/(22).o Multilayer perceptron (neural network): For a positive numberp1and a negative numberp2,K(
22、x,y) = tanh(p1 +p2).Note: Not every set ofp1andp2gives a valid reproducing kernel. fitcsvmdoes not support the sigmoid kernel.The mathematical approach using kernels relies on the computational method of hyperplanes. All the calculations for hyperplane classification use nothing more than dot produc
23、ts. Therefore, nonlinear kernels can use identical calculations and solution algorithms, and obtain classifiers that are nonlinear. The resulting classifiers are hypersurfaces in some spaceS, but the spaceSdoes not have to be identified or examined.Using Support Vector MachinesAs with any supervised
24、 learning model, you first train a support vector machine, and then cross validate the classifier. Use the trained machine to classify (predict) new data. In addition, to obtain satisfactory predictive accuracy, you can use variousSVMkernel functions, and you must tune the parameters of the kernel f
25、unctions. Training anSVMClassifier Classifying New Data with anSVMClassifier Tuning anSVMClassifierTraining anSVMClassifierTrain, and optionally cross validate, anSVMclassifier usingfitcsvm. The most common syntax is:SVMModel = fitcsvm(X,Y,KernelFunction,rbf,Standardize,true,ClassNames,negClass,posC
26、lass);The inputs are: X Matrix of predictor data, where each row is one observation, and each column is one predictor. Y Array of class labels with each row corresponding to the value of the corresponding row inX.Ycan be a character array, categorical, logical or numeric vector, or vector cell array
27、 of strings. Column vector with each row corresponding to the value of the corresponding row inX.Ycan be a categorical or character array, logical or numeric vector, or cell array of strings. KernelFunction The default value islinearfor two-class learning, which separates the data by a hyperplane. T
28、he valuerbfis the default for one-class learning, and uses a Gaussian radial basis function. An important step to successfully train anSVMclassifier is to choose an appropriate kernel function. Standardize Flag indicating whether the software should standardize the predictors before training the cla
29、ssifier. ClassNames Distinguishes between the negative and positive classes, or specifies which classes to include in the data. The negative class is the first element (or row of a character array), e.g.,negClass, and the positive class is the second element (or row of a character array), e.g.,posCl
30、ass.ClassNamesmust be the same data type asY. It is good practice to specify the class names, especially if you are comparing the performance of different classifiers.The resulting, trained model (SVMModel) contains the optimized parameters from theSVMalgorithm, enabling you to classify new data.For
31、 more name-value pairs you can use to control the training, see thefitcsvmreference page.Classifying New Data with anSVMClassifierClassify new data usingpredict. The syntax for classifying new data using a trainedSVMclassifier (SVMModel) is:label,score = predict(SVMModel,newX);The resulting vector,label, represents the classification of each row inX.scoreis ann-by-2 matrix
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1