CourseraMachineLearning机器学习课程笔记.docx

上传人:b****0 文档编号:12447223 上传时间:2023-04-19 格式:DOCX 页数:44 大小:371.22KB
下载 相关 举报
CourseraMachineLearning机器学习课程笔记.docx_第1页
第1页 / 共44页
CourseraMachineLearning机器学习课程笔记.docx_第2页
第2页 / 共44页
CourseraMachineLearning机器学习课程笔记.docx_第3页
第3页 / 共44页
CourseraMachineLearning机器学习课程笔记.docx_第4页
第4页 / 共44页
CourseraMachineLearning机器学习课程笔记.docx_第5页
第5页 / 共44页
点击查看更多>>
下载资源
资源描述

CourseraMachineLearning机器学习课程笔记.docx

《CourseraMachineLearning机器学习课程笔记.docx》由会员分享,可在线阅读,更多相关《CourseraMachineLearning机器学习课程笔记.docx(44页珍藏版)》请在冰豆网上搜索。

CourseraMachineLearning机器学习课程笔记.docx

CourseraMachineLearning机器学习课程笔记

MachineLearning

Week1

I.Introduction

Typesofmachinelearning

II.LinearRegressionwithOneVariable

Notation

Possibleh(“hypothesis”)functions

Costfunction

Learningrate

Gradientdescentfortwo-parameterlinearregression(repeatuntilconvergence)

“Batch”gradientdescent

III.LinearAlgebraRevision(optional)

Week2

IV.LinearRegressionwithMultipleVariables

Largenumberoffeature

Multi-parameterlinearregressioninvectornotation

Pickingfeatures

Costfunctionformultiplefeatures

Gradientdescentforthecostfunctionofmulti-parameterlinearregression

Featurescaling

Meannormalization

Howtopicklearningrateα

Polynomialregression

“Normalequation”:

UsingmatrixmultiplicationtosolveforθthatgivesminJ(θ)

GradientdescentvsSolvingforθ(“normalequation”)

Usingmatricestoprocessmultipletrainingcasesatonce

V.OctaveTutorial

Week3

VI.LogisticRegression

Classificationproblems

Two-class(orbinary-class)classification

Logisticregression

Decisionboundary

Cost-functionforlogisticregression

Gradientdescentforlogisticregression

Advancedoptimizationalgorithmsandconcepts

Multi-classclassification

VII.Regularization-theproblemofoverfitting

Underfittingvs.Overfitting

Addressingoverfitting

Regularization

Howtopickλ(lambda)

Lineargradientdescentwithregularization

Normalequationwithregularization

Logisticgradientdescentwithregularization

Week4

VIII.NeuralNetworks:

Representation

Neuralnetworks

Neuronsmodelledaslogisticunits

Neuralnetwork

Notation

Calculatingthehypothesisforasampleneuralnetwork

Vectorizedforwardpropagation

Week5

IX.NeuralNetworks:

Learning

Costfunctionformulti-classclassificationneuralnetwork

Forwardpropagation

Minimizingcostfunctionforneuralnetworks:

back-propagation

Back-propagationformultipletrainingsamples

Back-propagationintuition...

Useofadvancedminimumcostoptimizationalgorithms

Numericalgradientchecking

InitialvaluesofΘ

Networkarchitecture

Stepsintraininganeuralnetwork

Week6

X.AdviceforApplyingMachineLearning

Whattodowhenyougetunacceptablylargeerrorsafterlearning

Machinelearningdiagnostic

Evaluatingthehypothesisfunction

Calculatingmisclassificationerrorrate

Cross-validation-evaluatingalternativehypothesisfunctionmodels

Distinguishhighbiasfromhighvariance(underfittingvs.overfitting)

Choosingtheregularizationparameterλ

Toosmalltrainingset?

Learningcurves

Selectingmodelforneuralnetworks

XI.MachineLearningSystemDesign

Improvingamachinelearningsystem-whattoprioritize

Recommendedapproachforbuildinganewmachinelearningsystem

Erroranalysis

Errormeasureforskewedclasses

Predictionmetrics:

precisionandrecall

Predictionmetrics:

averageandF1score

Week7

X.SupportVectorMachines

Week8

XIII.Clustering

Typesofunsupervisedlearning

Notation

K-meansclusteringalgorithm

K-meanscostfunction(distortionfunction)

PracticalconsiderationsforK-means

XIV.DimensionalityReduction

Datacompression(datadimensionalityreduction)

Principalcomponentanalysis(PCA)

HowtochoosekforPCA

Decompressing(reconstructing)PCR-compresseddata

MoreaboutPCA

BaduseofPCA:

topreventoverfitting

RecommendationonapplyingPCA

Week9

XV.AnomalyDetection

Examplesofanomalydetection

Howanomalydetectionworks

Frauddetection

Gaussian(Normal)distribution

Densityestimation

Anomalydetectionalgorithm

Trainingtheanomalydetectionalgorithm

Evaluatingtheanomalydetectionalgorithm

XVI.RecommenderSystems

Week10

XVII.LargeScaleMachineLearning

XVIII.ApplicationExample:

PhotoOCR

Usefulresources

Week1

I.Introduction

Typesofmachinelearning

●Supervisedlearning(the“rightanswer”isprovidedasinput,inthe“trainingset”)

○Regressionproblem(expectedoutputisreal-value)

○Classificationproblem(answerisaclass,suchasyesorno)

●Unsupervisedlearning

II.LinearRegressionwithOneVariable

Notation

m:

Numberoftrainingexamples

x’s:

“input”variables(features)

y’s:

“output”(targets)variables

(x,y):

onetrainingexample

(x(i),y(i)):

ithtrainingexample

h():

function(t)foundbythelearningalgorithm

θ:

(theta)the“parameters”usedinh()togetherwiththefeaturesx

hθ():

notjustgenerallythefunctionh(),butspecificallyparameterizedwithθ

J(θ):

thecostfunctionofhθ()

n:

numberoffeatures(inputs)

x(i):

inputs(features)oftheithtrainingexample

xj(i):

thevalueoffeaturejintheithtrainingexample

λ:

(lambda)regularizationparameter

:

=meansassignmentinalgorithm,ratherthanmathematicalequality

Possibleh(“hypothesis”)functions

●Linearregressionwithonevariable(a.k.a.univariatelinearregression):

Shorthand:

h(x)Whereθ0andθ1are“parameters”

●Linearregressionwithmultiplevariables(a.k.a.multivariatelinearregression):

(fornfeatures)

●Polynomialregression

(e.g.byjustmakingupnewfeaturesthatarethesquareandcubeofanexistingfeature)

Costfunction

ThecostfunctionJ()evaluateshowcloseh(x)matchygiventheparametersfindingtheparametersθusedbyh().Forlinearregression(herewithonefeaturex)

pickθ0andθ1sothathθ(x)isclosetoyforourtrainingexamples(x,y)

i.e.minimizethe“sumofsquareerrors”costfunction:

Bytakingthesquareoftheerror(i.e.hθ(x)-y),weavoidhavingtoosmallresultsfromhθcancellingouttoolargeresults(as-12==1),thusyieldingatruer“cost”oftheerrors.

N.B.

”makessomeofthematheasier”(seeexplanationwhy).

“Squarederror”costfunction:

areasonablechoiceforcostfunction.Themostcommononeforregressionproblems.

Gradientdescent

Iterativealgorithmforfindingalocalminimumforthecostfunction.Worksforlinearregressionwithanynumberofparameters,butalsootherkindsof“hypotheses”functions.ScalesbetterforcaseswithlargenumberoffeaturesthansolvingfortheoptimalminJ()

(forj=0andj=1,repeatuntilconvergence)

where

isthe“parameter”inhθthatisusedforfeaturej

isthelearningrate

isthepartialderivative(slope)atthecurrentpointθj

isthecostfunction(inthiscasewithtwoparameters,forcaseswithonlyonefeature)

N.B.updateθ0andθ1simultaneously!

(i.e.asoneatomicoperation)

Learningrate

Thesize

ofeachstepwheniteratingtofindasolution.N.B.noneedtovary

betweeniterations.Gradientdescentwillnaturallytakesmallerandsmallerstepsthecloserwegettoasolution.

Gradientdescentfortwo-parameterlinearregression(repeatuntilconvergence)

forj=0andj=1

simplifiesto

“Batch”gradientdescent

Justmeanseachiterationofthegradientisappliedtoallthetrainingexamples(ina“batch”).

III.LinearAlgebraRevision(optional)

[...]

Week2

IV.LinearRegressionwithMultipleVariables

Largenumberoffeature

Forproblemsinvolvingmany“features”(i.e.x1,x2,x3...xn)linearalgebravectornotationismoreefficient.

Multi-parameterlinearregressioninvectornotation

forconvenienceofnotation,andtoallowuseofvectormultiplication,definea0thfeaturex0=1,thuswecanwrite

Now,defininga((n+1)×1)vectorxcontainingallthefeaturesanda((n+1)×1)vectorθcontainingalltheparametersforthehypothesisfunctionhθ,wecanefficientlymultiplythetwo(yieldingascalarresult)ifwefirsttranspose(rotate)theθintoθT

hθ(x)=θTxinOctave:

theta’*x

Pickingfeatures

Useyourdomaininsightsandintuitionstopickfeatures.E.g.derivingacombinedfeaturemighthelp.Therearealsoautomaticalgorithmsforpickingfeatures.

Costfunctionformultiplefeatures

Forn+1features(wherex0=1)thecombinedcostfunctionovermtrainingsampleswillbe

whichreallymeans(notethatistartsfrom1andjstartsfrom0)

Gradientdescentforthecostfunctionofmulti-parameterlinearregression

withthenumberoffeaturesn>=1andx0=1,oneiterationjofthegradientdescentis

inOctave:

theta=theta-alpha*(1/m)*sum((theta'*x-y)*x)

thus(atomicallyupdatingθjforj=0,...,n)

...

Practicalconsiderationsforgradientdescent

Featurescaling

Makesurethevaluesineachgroupofpropertiesxnareonthesameorder(i.e.scalesomegroupsifnecessary),orthegradientdescentwilltakealongtimetoconverge(becausethecontourplotistooelongated).Typically,makeallgroupscontainvalues

Meannormalization

Tomakesurethevaluesforfeaturexirangebetween-1and+1,replacexiwithxi-μiwhereμiisthemeanofallvaluesinthetrainingsetforfeaturexi(excludingx0asitisalways1)

Sotogether,

whereμiisthemeanofallvaluesforthepropertyinthetrainingsetandsiistherangeofvalues(i.e.max(x)-min(x))forthepropertyi.

Howtopicklearningrateα

Thenumberofiterationsbeforelineardescentconvergescanvaryalot(anythingbetween30and3millionisnormal).

Tomakesurethelineardescentworks,plottherunning-minimumofthecostfunctionJ(θ)andmakesureitdecreaseforeachiteration.

Automaticconvergencetest;e.g.declareconvergenceifJ(θ)changebylessthan10-3inoneiteration.Howeverlookingattheplotisusuallybetter.

IfJ(θ)isincreasingratherthandecreasing(oroscillating)thentheusualreasonisthatαistoobig.

ToosmallαwillresultistooslowchangeinJ(θ).

Tryingtodetermineαheuristically,trystepsof≈x3,so0.001,0.003,0.01,0.03,0.1,0.3,...

Polynomialregression

Ifasimplestraightlinedoesnotfitthetrainingdatawell,thenpolynomialregressioncanbeused.

Justdefinesomenewfeaturethatarethe

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 农林牧渔 > 水产渔业

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1