1、华南理工大学模式识别研究生复习资料docx华南理工大学模式识别(研究生)复习资料Bayes formulaCH1.Pattern Recognition SystemsData Acquisition & Sen singMeasurements of physical variablesPre-processi ngRemoval of noise in dataIsolati on of patter ns of in terest from the background (segmentation)Feature extractionFinding a new representatio
2、n in terms of featuresModel learning / estimationLearning a mapping between features and pattern groups and categoriesClassificationUsing features and learned models to assign a pattern to a categoryPost-processingEvaluation of confide nee in decisionsExploitation of con text to improve performanceC
3、ombination of expertsStartEndELearning strategiesSupervised learningA teacher provides a category label or cost for each pattern in the training setUnsupervised learningThe systems forms clusters or natural grouping of the input patternsReinforcement learningNo desired category is given but the teac
4、her provides feedback to the system such as the decision is right or wrongEvaluation methodsIndependent RunA statistical method, also called Bootstrap. Repeat the experiment times independently, and take the mean as the resultCross-validationDataset D is randomly divided into n disjoint sets Di of e
5、qual size n/m, where n is the number of samples in Di. Classifier is trained m times and each time with different set held out as a testing setBayes Decision Rule/ Decide 3iifp(3i|x) p(o)2|x)丁 Decide o)2讦 p(co2|x) p(i|x)Or equivale nt to/ Decide a)1if p(x|a)1)p(d)1) p(x|a)2)p(6o2)V Decide a)2if P(x|
6、a)2)P(6O2) p(x|co1)p(to1)Maximum Likelihood (ML) RuleWhen p(wi)=p(w2),the decision is based entirely on the likelihood p(x|wj) - p(x|w)ocp(x|w)/ Decide p(尢|伽) p(x|o)2)/ Decide 6)2if p(x|o2) p(x|6)i)Multivariate NormalDensitvjn d dimensions:1 _exp -(x-Az)rS_1(x-/z)p(x)=4WhereX = (xlfx2t.,xd)TM = (“1,
7、“2,川d)ry = J(x-g)(x- M)7,p(x)dx|Z| and E1 are determi nant and in verse respectivelyML Parameter Estimation1十k=lk=lError analysisProbability of error for multipass problems:Discriminant functionp(error|x 1 - .,p(o)c|x)Error 二 Baves Error + Added Error:Added ErrorBayes ErrorP(x|3i)p(5)dxj p(xla)2)p(o
8、2)dxRiIf decision point is at xb, then minThe discriminant functiong(M = xTWiX + wtx + wi0Where W=扌为严,-扣罗“-詢硏I1 d+ 加 P(5)R29iM = 一亍( - 丁百 *(x饥(2兀)一 ;Zn(|2?J) + 加(P(3f)Decision boundary9iM = gjMiv7 (x x0) = 0 where w =阻一andP(3f)/InLost function%弓+幻)-(心庐*如(闪一均)Conditional risk(exoected loss of taking
9、action ai):1PM = -=-expy/2naR(q|x)=久(如马)卩(吗伙)7=1Overall risk (expected loss):R = J /?(a(x)|x)p(x)dxzero-one loss function is used to minimize the error rateMinimum Risk Decision RuleThe fundamental rule is to decide a)r ifR(aik)vR(a2lx)Normal DistributionCH3.Normalized distance from origin to surfac
10、eSoDistance of arbitrary point to surfacellwllPerceptron Criterion 】Using yn 6 +1, 1, all patter ns n eed to satisfywT0(xn)yn 0For each misclassified sample, Perceptron Criterion tries to minimize:NE(w) = - y wr0(xn)ynn=lPseudoinverse MethodSum-of-squared-error functionJsM = l|Xw b|2Gradie nts(w) =
11、2XT(Xw b)Necessary ConditionXTXw = XTbw can be sojved uniauRlv怜=収)-取“ IProblem: 乂丁X s not always nonsingularThe solution depends on bExercise for Pseudoinverse Methodx and b are defined as0 1! 2 =(1 1 无 2”3 2 b = (1 1 1 1)TNormalize class 2 samplesx =(jcTxyxxT3/4 7/12 -1/2 - 1/60 -1/3 /Given a datas
12、et with 4 samples, we want to train a linear discriminant function 9(x) = wTx. Let x = /Hand b = 1 讦 class 1, otherwise b = 1. The sumof-squared- error function is selected as the criterion function. Find the value of w by pseudo-inverse method.indexclass101201131024 0 4 2Sum-o 仁 squarederror functi
13、on IsW = IIXw 一 b2 Gradient弘 3) = 2XT(Xw-b)Sum-of-squared-error functi onJsM = Xw-b2 Gradient弘(w) = 2XT(Xw - b) Necessary Condition XTXw =XTb w can be solved uniquely w = (XTXy1XTbLeast-Mean-Squared (Gradient Descent)Recall Sum-of-squarederror funotionni=lGradient function:n弘(w) = - bi)Xi1=1 Update
14、Rule:wk + 1) = w(k) + rj(k)(bi wTx()XiLinear classifier for multiple ClassesOn e-versus-the-restOne-versus-onelinearly separable problemA problem whose data of different classes can be separated exactly by linear decision surface.CH4.Perception update rulew(t 4- 1) = w(t) + g(t)y(切 if vu(t)Tx(0y(0 0
15、+ 1) = w(t)z otherwise(reward and punishment schemes)Exercise for perception There are four points in the 2dimemsional space Points (1,0), (0,1) belong to class Clf and points (0, 1), (1,0) belong to class C?. The goal of this example is to design a linear classifier using the perceptron algorithm i
16、n its reward and punishment form The learning rate is set equal to on巳 and initial weight vector is chosen as w = (0,0,0)=w(t + 1) = w(t) + “Xy(“ if W(t)rx()y(e) 0, w(2) = w(l)w(2)w(3)r 0(1) = 1 0, w(5) = w(4)w=1 0, w(4) = w(3)w(5)r (l (1) = 1 0, w(6) = w(5)w(一1) = 1 0, w(7) = w(6)(-l) = -l)(!(%!,X2
17、)=似p (-庇;F j,均=l,lr, Vi =备。2(衍以2)=似色=0,卩小2V 0 0/ 7 / 1 1 o o f f f 9 1 o o 1 /(X /l /(0.36780.13530.36780.1353 / 1 0.13530.3678 = | 0.3678 0.36781 I 0.1353 10.3678 0.3678 0.3678(2.284 W = 2.284V-1.692/CH5.Structure of RBF3 layers:Input layer: f(x)=xHidden layer: Gaussian functionOutput layer: linear
18、 weight sumbias 1forward pass: g=0.8385CH6.Margin*Margin is defined as the width that the boundary could be increased by before hitting a data point*The linear discriminant function (classifier) with the maximum margin is the best.*Data closest to the hyper plane are support vectors.Characteristic o
19、f RBFAdvantage:RBF network trains faster than MLPThe hidden layer is easier to interpret than MLPDisadvantage:During the testing, the calculation speed of a neuron in RBF is slower than MLPMaximum Margin Classification*Maximizing the margin is good according to intuition and theory.*Implies that onl
20、y support vectors are imports nt; other training examples are ignorableAdvantaae: (compare to LMS and perception)Better generalization ability & less over-fittingSlack variablesllwllKernels*We may use Kernel functions to implicitly map to a new feature spaceKernel:A:(xpx2)eR*Kernel must be equivalen
21、t to an inner product in some feature spaceSolving of SVM* Solving SVM is a quadratic programming problem Taraet: maximum margin -M = (x+ -x)nwrx+ +b = l =( + -y w = 2 2 . . . 1maximize minimize /(w、+/?)1Such that 1 1Nonlinear SVMThe original feature space can always be mapped to some higher-dimensional feature space where the training set is separableOptimization Problemf f r minimize 丄wz ., .Dual Problem for 2 (a/ is Lagrange multiplier):
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1