机器学习一步步教你理解反向传播方法.docx

资源描述

机器学习一步步教你理解反向传播方法.docx

《机器学习一步步教你理解反向传播方法.docx》由会员分享，可在线阅读，更多相关《机器学习一步步教你理解反向传播方法.docx（11页珍藏版）》请在冰豆网上搜索。

机器学习一步步教你理解反向传播方法.docx

机器学习一步步教你理解反向传播方法

机器学习：

一步步教你理解反向传播方法

在阅读反向传播方法的时候，看到了这篇通过示例给出反向传播的博文AStepbyStepBackpropagationExample在这篇博文中，作者通过一个简单的示例，给出了反向传播的过程的过程，非常的清晰，遂结合自己的理解翻译之，希望对反向传播方法有所理解的朋友有些许帮助。

背景反向传播在神经网络的训练过程中虽然用得如此之多，但是在网上还很少有通过具体的实例来解释反向传播怎么工作的博文。

所以在这篇文章中，我会尝试用一个具体的例子来解释反向传播过程，这样有需要的朋友就可以通过自己的计算过程来判断自己对于反向理解的过程是否到位。

你可以在我的Gihub上找个我写的反向传播的Python实现代码。

概览

在这篇博文中，我们会使用有2个输入单元的神经网络，2个隐层神经元以及2个输出神经元。

此外，隐层和输出神经元会包含一个偏置，下面是基本的网络结构：

为了便于后面说明的说明，我们对该网络设置一些初始的权

重、偏置以及输入和输出：

反向传播的目标是对权重进行优化，使得神经网络能够学习到从任意的输入到输出的准确映射。

在这篇博文中，我们仅使用一个简单的训练集，即输入为

0.05和0.10，我们希望网络的输出为0.01和0.99（即输入的样本是两个:

（0.05,0.99）,（0.10,0.99））。

前向传播首先来看看对于给定的初始化权重和偏置，网络对于输入0.05和0.10的输出是啥。

我们将输入输进网络中。

我们先计算从全部网络的输入到隐层的每一个神经元，激活函数采用logistic函数，对于从隐层到输出层，我们重复这一过程。

全部的网络输入也被称为网络的输入Derivationof

Backpropagation

下面是对于h1"role="presentation"style="position:

relative;">h1h1全部网络输入的输入计算过程：

neth1=w1∗i1+w2∗i2+b1∗1neth1=0.15∗0.05+0.2∗0.1+0.35∗1=0.3775"role="presentation"style="position:

relative;">neth1=w1i1+w2i2+b11neth1=0.150.05+0.20.1+0.351=0.3775neth1=w1i1+w2i2+b11neth1=0.150.05+0.20.1+0.351=0.3775

（译者注：

类比到CNN网络里，这个过程就是卷积过程，得到特征响应图）

然后我们将其输入到激活函数中，得到输出h1"role="presentation"style="position:

relative;">h1h1:

outh1=11+e−neth1=11+e−0.3775=0.593269992"role="presentation"style="position:

relative;">outh1=11+eneth1=11+e0.3775=0.593269992outh1=11+eneth1=11+e0.3775=0.593269992

（译者注：

类比到CNN网络里，这个过程特征响应图经过激活函数运算的过程）

对于h2"role="presentation"style="position:

relative;">h2h2通过上面相同的过程，我们可以得到：

outh2=0.596884378"role="presentation"style="position:

relative;">outh2=0.596884378outh2=0.596884378对于输入层神经元，将隐层的输出作为输入（译者注：

在CNN中，还需要经过池化后才能作为下一层的输入，至于为啥需要池化，这里译者不就解释了），重复上面相同的过程，我们可以得到：

neto1=w5∗outh1+w6∗outh2+b2∗

1neto1=0.4∗0.593269992+0.45∗0.596884378+0.6∗1=1.105905967outo1=11+e−neto1=11+e−1.105905967=0.75136507"role="presentation"style="position:

relative;">neto1=w5outh1+w6outh2+b21neto1=0.40.593269992+0.450.596884378+0.61=1.105905967outo1=11+eneto1=11+e1.105905967=0.75136507neto1=w5outh1+w6outh2+b21neto1=0.40.593269992+0.450.596884378+0.61=1.105905967outo1=11+eneto1=11+e1.105905967=0.75136507

同样的，重复上面相同的过程，可以得到O2"role="presentation"style="position:

relative;">O2O2:

outO2=0.772928465"role="presentation"style="position:

relative;">outO2=0.772928465outO2=0.772928465计算总误差现在对于输出的每一个神经元，使用平方误差函数求和来计算总的误差：

Etotal=∑12（target−output）2"role="presentation"style="position:

刀12（targetoutp

relative;">Etotal=刀12（targetoutput）2Etotal=

ut）2

output就是我们的预测label，而target就是groundtruth。

12"role="presentation"style="position:

relative;">1212使

得我们在求骗到的时候可以消去2，不影响模型参数的结果求解。

对于第一个神经元的输出O1"role="presentation"style="position:

relative;">O1O1真实值是0.01，而网络的输出是0.75136507，因而第一个神经元的输出误差为：

EO1=12（target−output）2=12（0.01−0.75136507）2=0.274811083"role="presentation"style="position:

relative;">EO1=12（targetoutput）2=12（0.010.75136507）2=0.274811083EO1=12（targetoutput）2=12（0.010.75136507）2=0.274811083重复上面过程，可以得到第二个神经元的输出O2"role="presentation"style="position:

relative;">O2O2为：

EO2=0.023560026"role="presentation"style="position:

relative;">EO2=0.023560026EO2=0.023560026所以整个神经网络的误差求和为：

Etotal=EO1+EO2=0.274811083+0.023560026=0.298371109"role="presentation"style="position:

relative;">Etotal=EO1+EO2=0.274811083+0.023560026=0.298371109Etotal=EO1+EO2=0.274811083+0.02356002

6=0.298371109

反向传播反向传播的目标是：

通过更新网络中的每一个权重，使得最终的输出接近于groundtruth，这样就得到整个网络的误差作为一个整体进行了最小化。

输出层

先来考察w5"role="presentation"style="position:

relative;">w5w5，我们想知道对于w5"role="presentation"style="position:

relative;">w5w5的改变可以多大程度上影响总的误差，也就是∂Etotal∂w5"role="presentation"style="position:

relative;">Etotalw5Etotalw5。

通过使用链式法则，可以得到：

∂Etotal∂w5=∂Etotal∂outo1∗∂outO1∂netO1∗∂netO1∂w5"role="presentation"style="position:

relative;">Etotalw5=Etotalouto1outO1netO1netO1w5Etotalw5=Etotalouto1outO1netO1netO1w5为了更直观的表述上面链式法则的过程，对其进行可视化：

我们对上面使用链式法则得到的每一项分别进行计算。

首先，

整体误差关于各个神经元的输出改变了？

Etotal=∑12（target−output）2=12（targetO1&

#x2212;outputO1）2+12（targetO2−outputO2）2&#x

2202;Etotal∂outO1=2∗12（targetO1"

12;outputO1）2−1∗−1+0=−

;（targetO1−outputO1）=−（0.01−0.7

5136507）=0.74136507"role="presentation"style="position:

relative;">Etotal=刀12（targetoutput）2=12（targetO1outputO

1）2+12（targetO2outputO2）2EtotaloutO1=212（targetO1out

putO1）211+0=（targetO1outputO1）=（0.010.75136507）=0.7

4136507Etotal=刀12（targetoutput）2=12（targetO1outputO1）

2+12（targetO2outputO2）2EtotaloutO1=212（targetO1outpu

tO1）211+0=（targetO1outputO1）=（0.010.75136507）=0.741

36507

logistic函数的偏导数为输出乘以1减去输出，即：

outO1=11+e−netO1∂outO1∂netO

1=outO1（1−outO1）=0.75136507（1−0.751

36507）=0.186815602"role="presentation"style="position:

relative;">outO1=11+enetO1outO1netO1=outO1（1outO1）

=0.75136507（10.75136507）=0.186815602outO1=11+enet

O1outO1netO1=outO1（1outO1）=0.75136507（10.7513650

7）=0.186815602

最后，整个网络的输入O1"role="presentation"

style="position:

relative;">O1O1关于w5"role="presentation"style="position:

relative;">w5w5改变了多少呢？

netO1=w5∗outh1+w6∗outh2+b2∗1∂Etotal∂w5=∂Etotal∂outo1∗∂outO1∂netO1∗∂netO1∂w5∂Etotal∂w5=0.74136507∗0.186815602∗0.593269992=0.082167041"role="presentation"style="position:

relative;">netO1=w5outh1+w6outh2+b21Etotalw5=Etotalouto1outO1netO1netO1w5Etotalw5=0.741365070.1868156020.593269992=0.082167041netO1=w5outh1+w6outh2+b21Etotalw5=Etotalouto1outO1netO1netO1w5Etotalw5=0.741365070.1868156020.593269992=0.082167041

你也会看到用delta规则表示的形式：

∂Etotal∂w5=−（targetO1−outO1）∗outO1（1−outo1）∗outh1"role="presentation"style="position:

relative;">Etotalw5=（targetO1outO1）outO1（1outo1）outh1Etotalw5=（targetO1outO1）outO1（1outo1）outh1我们可以将∂Etotal∂outO1"

role="presentation"style="position:

relative;">EtotaloutO1EtotaloutO1和∂outO1∂netO1"role="presentation"style="position:

relative;">outO1netO1outO1netO1写为∂Etotal∂netO1"role="presentation"style="position:

relative;">EtotalnetO1EtotalnetO1，并用

δO1"role="presentation"style="position:

relative;"〉8O18表示它，从而可以将上面的式子表示为：

δO1=∂Etotal∂outo1∗∂outO1∂netO1δO1=−（targetO1−outO1）∗outO1（1−outo1）"role="presentation"style="position:

relative;">8O1=Etotalouto1outO1netO18O1=（targetO1outO1）outO1（1outo1）8O1=Etotalouto1outO1netO18O1=（targetO1outO1）outO1（1outo1）

因此有：

∂Etotal∂w5=δO1outh1"

role="presentation"style="position:

relative;">Etotalw5=8O1outh1Etotalw5=8O1outh1有一些论文中通过将负号从δ"role="presentation"style="position:

relative;">中提8出8来将其也可以写为下面这种形式：

∂Etotal∂w5=−δO1outh1"

role="presentation"style="position:

relative;">Etotalw5=8O1outh1Etotalw5=8Olouthl

为了减小误差，我们将w5"role="presentation"style="position:

relative;">w5w5原来的值减去目前的权重

（通常会乘上一个学习率η"role="presentation"

w5+=w5−η∗∂Etotal∂w5"role="presentation"style="position:

relative;">w+5=w5nEtotalw5w5+=w5nEtotalw5

学习率在不同的文章中可以记法不一样，有用α"

role="presentation"style="position:

relative;"〉的，有用η"role="presentation"style="position:

relative;"〉的n有用&#xO3F5;"role="presentation"style="position:

relative;">的。

和w8"

重复上面的过程，我们可以得到更新后的w6"role="presentation"style="position:

relative;"〉w6w6、w7"role="presentation"style="position:

relative;"〉w7w7role="presentation"style="position:

relative;"〉w8w8

w6+=0.408666186w7+=0.511301270w8+=0.561370121"

role="presentation"style="position:

relative;">w+6=0.408666186w+7=0.511301270w+8=0.561370121w6+=0.408666186w7+=0.511301270w8+=0.561370121注意，在我们继续向前推进反向传播的时候，在要使用到w5"role="presentation"style="position:

relative;">w5w5、

w6"role="presentation"style="position:

relative;">w6w6、

w7"role="presentation"style="position:

relative;">w7w7和

w8"role="presentation"style="position:

relative;">w8w8的

地方，我们仍然使用的是原来的权重，而不是更新后的权重。

隐层

同样使用链式法则，我们可以得到：

∂Etotal∂w1=∂Etotal∂outh1∗∂outh1∂neth1∗Ƞ

2;neth1∂w1"role="presentation"style="position:

relative;">Etotalw1=Etotalouth1outh1neth1neth1w1Etotalw1=Etotalouth1outh1neth1neth1w1

可视化上面的链式法则：

对于这一层（隐层）的更新我们采用上面输出层相似的处理方式，不过会稍有不同，这种不同主要是因为每一个隐层神经元的输出对于最终的输出都是有贡献的。

我们知道outh1"role="presentation"style="position:

relative;">outh1outh1既影响outO1"role="presentation"style="position:

relative;">outO1outO1也影响outO2"role="presentation"style="position:

relative;">outO2outO2，因此∂Etotal∂outh1"role="presentation"style="position:

relative;">Etotalouth1Etotalouth1需要同时考虑到这两个输出神经元影响：

∂Etotal∂outh1=∂EO1∂outh1+∂EO2∂outh1"role="presentation"style="position:

relative;">Etotalouth1=EO1outh1+EO2outh1Etotalouth1=EO1outh1+EO2outh1又由于：

∂EO1∂outh1=∂EO1∂netO1∗∂netO1∂outh1"role="presentation"style="position:

relative;">EO1outh1=EO1netO1netO1outh1EO1outh1=E

O1netO1netO1outh1

我们可以用前面计算的值来计算

∂EO1∂netO1"role="presentation"

style="position:

relative;">EO1netO1EO1netO1:

∂EO1∂netO1=∂EO1∂out

O1∗∂outO1∂netO1=0.74136507

∗0.186815602=0.138498562"role="presentation"style="position:

relative;">EO1netO1=EO1outO1outO1netO1=0.74136507

0.186815602=0.138498562EO1netO1=EO1outO1outO1netO1=0.741365070.186815602=0.138498562又因为∂netO1∂outh1"role="presentation"style="position:

relative;">netO1outh1netO1outh

展开阅读全文