CourseraMachineLearning机器学习课程笔记.docx

资源描述

CourseraMachineLearning机器学习课程笔记.docx

《CourseraMachineLearning机器学习课程笔记.docx》由会员分享，可在线阅读，更多相关《CourseraMachineLearning机器学习课程笔记.docx（44页珍藏版）》请在冰豆网上搜索。

CourseraMachineLearning机器学习课程笔记.docx

CourseraMachineLearning机器学习课程笔记

MachineLearning

Week1

I.Introduction

Typesofmachinelearning

II.LinearRegressionwithOneVariable

Notation

Possibleh（“hypothesis”）functions

Costfunction

Learningrate

Gradientdescentfortwo-parameterlinearregression（repeatuntilconvergence）

“Batch”gradientdescent

III.LinearAlgebraRevision（optional）

Week2

IV.LinearRegressionwithMultipleVariables

Largenumberoffeature

Multi-parameterlinearregressioninvectornotation

Pickingfeatures

Costfunctionformultiplefeatures

Gradientdescentforthecostfunctionofmulti-parameterlinearregression

Featurescaling

Meannormalization

Howtopicklearningrateα

Polynomialregression

“Normalequation”:

UsingmatrixmultiplicationtosolveforθthatgivesminJ（θ）

GradientdescentvsSolvingforθ（“normalequation”）

Usingmatricestoprocessmultipletrainingcasesatonce

V.OctaveTutorial

Week3

VI.LogisticRegression

Classificationproblems

Two-class（orbinary-class）classification

Logisticregression

Decisionboundary

Cost-functionforlogisticregression

Gradientdescentforlogisticregression

Advancedoptimizationalgorithmsandconcepts

Multi-classclassification

VII.Regularization-theproblemofoverfitting

Underfittingvs.Overfitting

Addressingoverfitting

Regularization

Howtopickλ（lambda）

Lineargradientdescentwithregularization

Normalequationwithregularization

Logisticgradientdescentwithregularization

Week4

VIII.NeuralNetworks:

Representation

Neuralnetworks

Neuronsmodelledaslogisticunits

Neuralnetwork

Notation

Calculatingthehypothesisforasampleneuralnetwork

Vectorizedforwardpropagation

Week5

IX.NeuralNetworks:

Learning

Costfunctionformulti-classclassificationneuralnetwork

Forwardpropagation

Minimizingcostfunctionforneuralnetworks:

back-propagation

Back-propagationformultipletrainingsamples

Back-propagationintuition...

Useofadvancedminimumcostoptimizationalgorithms

Numericalgradientchecking

InitialvaluesofΘ

Networkarchitecture

Stepsintraininganeuralnetwork

Week6

X.AdviceforApplyingMachineLearning

Whattodowhenyougetunacceptablylargeerrorsafterlearning

Machinelearningdiagnostic

Evaluatingthehypothesisfunction

Calculatingmisclassificationerrorrate

Cross-validation-evaluatingalternativehypothesisfunctionmodels

Distinguishhighbiasfromhighvariance（underfittingvs.overfitting）

Choosingtheregularizationparameterλ

Toosmalltrainingset?

Learningcurves

Selectingmodelforneuralnetworks

XI.MachineLearningSystemDesign

Improvingamachinelearningsystem-whattoprioritize

Recommendedapproachforbuildinganewmachinelearningsystem

Erroranalysis

Errormeasureforskewedclasses

Predictionmetrics:

precisionandrecall

Predictionmetrics:

averageandF1score

Week7

X.SupportVectorMachines

Week8

XIII.Clustering

Typesofunsupervisedlearning

Notation

K-meansclusteringalgorithm

K-meanscostfunction（distortionfunction）

PracticalconsiderationsforK-means

XIV.DimensionalityReduction

Datacompression（datadimensionalityreduction）

Principalcomponentanalysis（PCA）

HowtochoosekforPCA

Decompressing（reconstructing）PCR-compresseddata

MoreaboutPCA

BaduseofPCA:

topreventoverfitting

RecommendationonapplyingPCA

Week9

XV.AnomalyDetection

Examplesofanomalydetection

Howanomalydetectionworks

Frauddetection

Gaussian（Normal）distribution

Densityestimation

Anomalydetectionalgorithm

Trainingtheanomalydetectionalgorithm

Evaluatingtheanomalydetectionalgorithm

XVI.RecommenderSystems

Week10

XVII.LargeScaleMachineLearning

XVIII.ApplicationExample:

PhotoOCR

Usefulresources

Week1

I.Introduction

Typesofmachinelearning

●Supervisedlearning（the“rightanswer”isprovidedasinput,inthe“trainingset”）

○Regressionproblem（expectedoutputisreal-value）

○Classificationproblem（answerisaclass,suchasyesorno）

●Unsupervisedlearning

II.LinearRegressionwithOneVariable

Notation

Numberoftrainingexamples

x’s:

“input”variables（features）

y’s:

“output”（targets）variables

（x,y）:

onetrainingexample

（x（i）,y（i））:

ithtrainingexample

h（）:

function（t）foundbythelearningalgorithm

θ:

（theta）the“parameters”usedinh（）togetherwiththefeaturesx

hθ（）:

notjustgenerallythefunctionh（）,butspecificallyparameterizedwithθ

J（θ）:

thecostfunctionofhθ（）

numberoffeatures（inputs）

x（i）:

inputs（features）oftheithtrainingexample

xj（i）:

thevalueoffeaturejintheithtrainingexample

λ:

（lambda）regularizationparameter

=meansassignmentinalgorithm,ratherthanmathematicalequality

Possibleh（“hypothesis”）functions

●Linearregressionwithonevariable（a.k.a.univariatelinearregression）:

○

Shorthand:

h（x）Whereθ0andθ1are“parameters”

●Linearregressionwithmultiplevariables（a.k.a.multivariatelinearregression）:

○

（fornfeatures）

●Polynomialregression

○

（e.g.byjustmakingupnewfeaturesthatarethesquareandcubeofanexistingfeature）

Costfunction

ThecostfunctionJ（）evaluateshowcloseh（x）matchygiventheparametersfindingtheparametersθusedbyh（）.Forlinearregression（herewithonefeaturex）

pickθ0andθ1sothathθ（x）isclosetoyforourtrainingexamples（x,y）

i.e.minimizethe“sumofsquareerrors”costfunction:

Bytakingthesquareoftheerror（i.e.hθ（x）-y）,weavoidhavingtoosmallresultsfromhθcancellingouttoolargeresults（as-12==1）,thusyieldingatruer“cost”oftheerrors.

N.B.

”makessomeofthematheasier”（seeexplanationwhy）.

“Squarederror”costfunction:

areasonablechoiceforcostfunction.Themostcommononeforregressionproblems.

Gradientdescent

Iterativealgorithmforfindingalocalminimumforthecostfunction.Worksforlinearregressionwithanynumberofparameters,butalsootherkindsof“hypotheses”functions.ScalesbetterforcaseswithlargenumberoffeaturesthansolvingfortheoptimalminJ（）

（forj=0andj=1,repeatuntilconvergence）

where

isthe“parameter”inhθthatisusedforfeaturej

isthelearningrate

isthepartialderivative（slope）atthecurrentpointθj

isthecostfunction（inthiscasewithtwoparameters,forcaseswithonlyonefeature）

N.B.updateθ0andθ1simultaneously!

（i.e.asoneatomicoperation）

Learningrate

Thesize

ofeachstepwheniteratingtofindasolution.N.B.noneedtovary

betweeniterations.Gradientdescentwillnaturallytakesmallerandsmallerstepsthecloserwegettoasolution.

Gradientdescentfortwo-parameterlinearregression（repeatuntilconvergence）

forj=0andj=1

simplifiesto

“Batch”gradientdescent

Justmeanseachiterationofthegradientisappliedtoallthetrainingexamples（ina“batch”）.

III.LinearAlgebraRevision（optional）

[...]

Week2

IV.LinearRegressionwithMultipleVariables

Largenumberoffeature

Forproblemsinvolvingmany“features”（i.e.x1,x2,x3...xn）linearalgebravectornotationismoreefficient.

Multi-parameterlinearregressioninvectornotation

forconvenienceofnotation,andtoallowuseofvectormultiplication,definea0thfeaturex0=1,thuswecanwrite

Now,defininga（（n+1）×1）vectorxcontainingallthefeaturesanda（（n+1）×1）vectorθcontainingalltheparametersforthehypothesisfunctionhθ,wecanefficientlymultiplythetwo（yieldingascalarresult）ifwefirsttranspose（rotate）theθintoθT

hθ（x）=θTxinOctave:

theta’*x

Pickingfeatures

Useyourdomaininsightsandintuitionstopickfeatures.E.g.derivingacombinedfeaturemighthelp.Therearealsoautomaticalgorithmsforpickingfeatures.

Costfunctionformultiplefeatures

Forn+1features（wherex0=1）thecombinedcostfunctionovermtrainingsampleswillbe

whichreallymeans（notethatistartsfrom1andjstartsfrom0）

Gradientdescentforthecostfunctionofmulti-parameterlinearregression

withthenumberoffeaturesn>=1andx0=1,oneiterationjofthegradientdescentis

inOctave:

theta=theta-alpha*（1/m）*sum（（theta'*x-y）*x）

thus（atomicallyupdatingθjforj=0,...,n）

...

Practicalconsiderationsforgradientdescent

Featurescaling

Makesurethevaluesineachgroupofpropertiesxnareonthesameorder（i.e.scalesomegroupsifnecessary）,orthegradientdescentwilltakealongtimetoconverge（becausethecontourplotistooelongated）.Typically,makeallgroupscontainvalues

Meannormalization

Tomakesurethevaluesforfeaturexirangebetween-1and+1,replacexiwithxi-μiwhereμiisthemeanofallvaluesinthetrainingsetforfeaturexi（excludingx0asitisalways1）

Sotogether,

whereμiisthemeanofallvaluesforthepropertyinthetrainingsetandsiistherangeofvalues（i.e.max（x）-min（x））forthepropertyi.

Howtopicklearningrateα

Thenumberofiterationsbeforelineardescentconvergescanvaryalot（anythingbetween30and3millionisnormal）.

Tomakesurethelineardescentworks,plottherunning-minimumofthecostfunctionJ（θ）andmakesureitdecreaseforeachiteration.

Automaticconvergencetest;e.g.declareconvergenceifJ（θ）changebylessthan10-3inoneiteration.Howeverlookingattheplotisusuallybetter.

IfJ（θ）isincreasingratherthandecreasing（oroscillating）thentheusualreasonisthatαistoobig.

ToosmallαwillresultistooslowchangeinJ（θ）.

Tryingtodetermineαheuristically,trystepsof≈x3,so0.001,0.003,0.01,0.03,0.1,0.3,...

Polynomialregression

Ifasimplestraightlinedoesnotfitthetrainingdatawell,thenpolynomialregressioncanbeused.

Justdefinesomenewfeaturethatarethe

展开阅读全文