带激活函数的梯度下降及线性回归算法和matlab代码0522.docx

资源描述

带激活函数的梯度下降及线性回归算法和matlab代码0522.docx

《带激活函数的梯度下降及线性回归算法和matlab代码0522.docx》由会员分享，可在线阅读，更多相关《带激活函数的梯度下降及线性回归算法和matlab代码0522.docx（20页珍藏版）》请在冰豆网上搜索。

带激活函数的梯度下降及线性回归算法和matlab代码0522.docx

带激活函数的梯度下降及线性回归算法和matlab代码0522

带激活函数的梯度下降及线性回归和matlab代码

1.单变量线性回归带输入输出数据归一化处理

我们知道，对于输入数据进行归一化处理能极大地提高神经网络的学习速率，从而提高神经网络的收敛速度。

对于同样的学习速率，未经过归一化数据处理的同样的数据集训练神经网络时，将会出现发散的现象，导致神经网络无法收敛。

神经网络的激活函数（比如Relu、Sigmoid函数）会导致神经网络的输出节点只能输出正值，无法输出负值。

此时由于激活函数的输出值范围是正值，因此神经网络的期望输出值也是正值。

万一当神经网络的训练数据集中包含负值时，可对输出数据集进行加一个最大负值的绝对值的操作，使得神经网络的期望输出值全部为正值。

如果神经网络的输出值的范围比较大，也可以对神经网络输出值进行归一化处理。

如果神经元包含激活函数，则激活函数会自动使得神经元的输出为非负值。

（1）.未进行输入数据归一化处理的代码

clearall

clc

%trainingsampledata;

p0=3;

p1=7;

x=1:

y=p0+p1*x;

num_sample=size（y,2）;

%gradientdescendingprocess

%initialvaluesofparameters

theta0=1;

theta1=3;

%learningrate

alpha=0.33;

%ifalphaistoolarge,thefinalerrorwillbemuchlarge.

%ifalphaistoosmall,theconvergencewillbeslow

epoch=100;

fork=1:

epoch

v_k=k

h_theta_x=theta0+theta1*x;%hypothesisfunction

Jcost（k）=（（h_theta_x

（1）-y

（1））^2+（h_theta_x

（2）-y

（2））^2+（h_theta_x（3）-y（3））^2）/num_sample

theta0=theta0-alpha*（（h_theta_x

（1）-y

（1））+（h_theta_x

（2）-y

（2））+（h_theta_x（3）-y（3）））/num_sample;

theta1=theta1-alpha*（（h_theta_x

（1）-y

（1））*x

（1）+（h_theta_x

（2）-y

（2））*x

（2）+（h_theta_x（3）-y（3））*x（3））/num_sample;

end

plot（Jcost）

yn=theta0+theta1*x

上述未进行输入数据归一化处理的代码最大的学习速率为0.35，在迭代次数到达60次时，输出误差下降至0.0000。

（2）.进行输入数据归一化处理的代码

clearall

clc

%trainingsampledata;

p0=3;

p1=7;

x=1:

x_mean=mean（x）

x_max=max（x）

x_min=min（x）

xn=（x-x_mean）/（x_max-x_min）

x=xn;

y=p0+p1*x

y=y+0.5;

num_sample=size（y,2）;

%gradientdescendingprocess

%initialvaluesofparameters

theta0=1;

theta1=3;

%learningrate

alpha=0.9;

%ifalphaistoolarge,thefinalerrorwillbemuchlarge.

%ifalphaistoosmall,theconvergencewillbeslow

epoch=100;

fork=1:

epoch

v_k=k

h_theta_x=theta0+theta1*x;%hypothesisfunction

Jcost（k）=（（h_theta_x

（1）-y

（1））^2+（h_theta_x

（2）-y

（2））^2+（h_theta_x（3）-y（3））^2）/num_sample

theta0=theta0-alpha*（（h_theta_x

（1）-y

（1））+（h_theta_x

（2）-y

（2））+（h_theta_x（3）-y（3）））/num_sample;

theta1=theta1-alpha*（（h_theta_x

（1）-y

（1））*x

（1）+（h_theta_x

（2）-y

（2））*x

（2）+（h_theta_x（3）-y（3））*x（3））/num_sample;

end

yn=theta0+theta1*x;

plot（Jcost）

上述进行输入数据归一化处理的代码最大的学习速率为0.96，在迭代次数到达33次时，输出误差下降至0.0000。

上述代码为了使得输出值中不包含负数，对所有输出值都加了0.5。

这个0.5被反映至theta0的值上。

Theta0的值比p0的值多0.5。

（3）.输出带sigmoid激活函数的线性回归算法matlab代码

clearall

clc

%trainingsampledata;

p0=2;

p1=9;

x=1:

x_mean=mean（x）

x_max=max（x）

x_min=min（x）

xn=（x-x_mean）/（x_max-x_min）

x=xn;

y_temp=p0+p1*x;

y=1./（1+exp（-y_temp））;

num_sample=size（y,2）;

%gradientdescendingprocess

%initialvaluesofparameters

theta0=1;

theta1=3;

%learningrate

alpha=69;

%ifalphaistoolarge,thefinalerrorwillbemuchlarge.

%ifalphaistoosmall,theconvergencewillbeslow

epoch=800;

fork=1:

epoch

v_k=k

zc=theta0+theta1*x;

%h_theta_x=theta0+theta1*x;%hypothesisfunction

h_theta_x=1./（1+exp（-zc））;

fz=h_theta_x.*（1-h_theta_x）;

Jcost（k）=（（h_theta_x

（1）-y

（1））^2+（h_theta_x

（2）-y

（2））^2+（h_theta_x（3）-y（3））^2）/num_sample;

theta0=theta0-alpha*（（h_theta_x

（1）-y

（1））*fz

（1）+（h_theta_x

（2）-y

（2））*fz

（2）+（h_theta_x（3）-y（3））*fz（3））/num_sample;

theta1=theta1-alpha*（（h_theta_x

（1）-y

（1））*x

（1）*fz

（1）+（h_theta_x

（2）-y

（2））*x

（2）*fz

（2）+（h_theta_x（3）-y（3））*x（3）*fz（3））/num_sample;

end

ynt=theta0+theta1*x;

yn=1./（1+exp（-ynt））

plot（Jcost）

上述matlab代码的训练过程的误差见下图所示：

图1训练过程中的误差示意图

（4）.输出带sigmoid激活函数的双输入线性回归算法matlab代码

%doublevariableinputwithactivationfunction

%normalizationofinputdata

clearall

clc

%trainingsampledata;

p0=2;

p1=9;

p2=3;

x1=[16128731112];

x2=[3791284292];

x1_mean=mean（x1）

x1_max=max（x1）

x1_min=min（x1）

x1n=（x1-x1_mean）/（x1_max-x1_min）

x1=x1n;

x2_mean=mean（x2）

x2_max=max（x2）

x2_min=min（x2）

x2n=（x2-x2_mean）/（x2_max-x2_min）

x2=x2n;

y_temp=p0+p1*x1+p2*x2;

y=1./（1+exp（-y_temp））;

num_sample=size（y,2）;

%gradientdescendingprocess

%initialvaluesofparameters

theta0=1;

theta1=3;

theta2=8;

%learningrate

alpha=39;

%ifalphaistoolarge,thefinalerrorwillbemuchlarge.

%ifalphaistoosmall,theconvergencewillbeslow

%lamda=0.0001;

lamda=0.000001;

epoch=29600;

fork=1:

epoch

v_k=k

zc=theta0+theta1*x1+theta2*x2;

%h_theta_x=theta0+theta1*x;%hypothesisfunction

h_theta_x=1./（1+exp（-zc））;

fz=h_theta_x.*（1-h_theta_x）;

%Jcost（k）=（（h_theta_x

（1）-y

（1））^2+（h_theta_x

（2）-y

（2））^2+（h_theta_x（3）-y（3））^2）/num_sample;

Jcost（k）=sum（（h_theta_x-y）.^2）/num_sample;

%theta0=theta0-alpha*（（h_theta_x

（1）-y

（1））*fz

（1）+（h_theta_x

（2）-y

（2））*fz

（2）+（h_theta_x（3）-y（3））*fz（3））/num_sample;

r0=sum（（h_theta_x-y）.*fz）;

theta0=theta0-alpha*r0/num_sample;

%theta1=theta1-alpha*（（h_theta_x

（1）-y

（1））*x

（1）*fz

（1）+（h_theta_x

（2）-y

（2））*x

（2）*fz

（2）+（h_theta_x（3）-y（3））*x（3）*fz（3））/num_sample+lamda*theta1;

r1=sum（（（h_theta_x-y）.*x1）.*fz）;

theta1=theta1-alpha*r1/num_sample+lamda*theta1;

r2=sum（（（h_theta_x-y）.*x2）.*fz）;

theta2=theta2-alpha*r2/num_sample+lamda*theta2;

end

ynt=theta0+theta1*x1+theta2*x2;

yn=1./（1+exp（-ynt））

plot（Jcost）

（5）.输出带sigmoid激活函数的三输入双输出线性回归算法matlab代码

%triplevariableinputswithactivationfunction

%doubleoutputs

%normalizationofinputdata

clearall

clc

%trainingsampledata;

pa0=2;pa1=9;pa2=3;pa3=11;

pb0=3;pb1=1;pb2=2;pb3=6;

x1=[1612873111318273117];

x2=[379128492837912];

x3=[91792268412239226841];

x1_mean=mean（x1）

x1_max=max（x1）

x1_min=min（x1）

x1n=（x1-x1_mean）/（x1_max-x1_min）

x1=x1n;

x2_mean=mean（x2）

x2_max=max（x2）

x2_min=min（x2）

x2n=（x2-x2_mean）/（x2_max-x2_min）

x2=x2n;

x3_mean=mean（x3）

x3_max=max（x3）

x3_min=min（x3）

x3n=（x3-x3_mean）/（x3_max-x3_min）

x3=x3n;

ya_temp=pa0+pa1*x1+pa2*x2+pa3*x3;

ya=1./（1+exp（-ya_temp））;

yb_temp=pb0+pb1*x1+pb2*x2+pb3*x3;

yb=1./（1+exp（-yb_temp））;

num_sample=size（yb,2）;

%gradientdescendingprocess

%initialvaluesofparameters

thetaa0=1;thetaa1=3;thetaa2=8;thetaa3=2;

thetab0=2;thetab1=5;thetab2=9;thetab3=6;

%learningrate

alpha=6;

%ifalphaistoolarge,thefinalerrorwillbemuchlarge.

%ifalphaistoosmall,theconvergencewillbeslow

%lamda=0.0001;

lamda=0.00001;

epoch=39600;

fork=1:

epoch

v_k=k

zac=thetaa0+thetaa1*x1+thetaa2*x2+thetaa3*x3;

zbc=thetab0+thetab1*x1+thetab2*x2+thetab3*x3;

%h_theta_x=thetaa0+thetaa1*x;%hypothesisfunction

ha_theta_x=1./（1+exp（-zac））;

hb_theta_x=1./（1+exp（-zbc））;

faz=ha_theta_x.*（1-ha_theta_x）;

fbz=hb_theta_x.*（1-hb_theta_x）;

%Jcost（k）=（（h_theta_x

（1）-y

（1））^2+（h_theta_x

（2）-y

（2））^2+（h_theta_x（3）-y（3））^2）/num_sample;

Jcosta（k）=single（sum（（ha_theta_x-ya）.^2）/num_sample）;

Jcostb（k）=single（sum（（hb_theta_x-yb）.^2）/num_sample）;

%thetaa0=thetaa0-alpha*（（h_theta_x

（1）-y

（1））*fz

（1）+（h_theta_x

（2）-y

（2））*fz

（2）+（h_theta_x（3）-y（3））*fz（3））/num_sample;

ra0=sum（（ha_theta_x-ya）.*faz）;

thetaa0=thetaa0-alpha*ra0/num_sample;

%thetaa1=thetaa1-alpha*（（h_theta_x

（1）-y

（1））*x

（1）*fz

（1）+（h_theta_x

（2）-y

（2））*x

（2）*fz

（2）+（h_theta_x（3）-y（3））*x（3）*fz（3））/num_sample+lamda*thetaa1;

ra1=sum（（（ha_theta_x-ya）.*x1）.*faz）;

thetaa1=thetaa1-alpha*ra1/num_sample+lamda*thetaa1;

ra2=sum（（（ha_theta_x-ya）.*x2）.*faz）;

thetaa2=thetaa2-alpha*ra2/num_sample+lamda*thetaa2;

ra3=sum（（（ha_theta_x-ya）.*x3）.*faz）;

thetaa3=thetaa3-alpha*ra3/num_sample+lamda*thetaa3;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

rb0=sum（（hb_theta_x-yb）.*fbz）;

thetab0=thetab0-alpha*rb0/num_sample;

%thetaa1=thetaa1-alpha*（（h_theta_x

（1）-y

（1））*x

（1）*fz

（1）+（h_theta_x

（2）-y

（2））*x

（2）*fz

（2）+（h_theta_x（3）-y（3））*x（3）*fz（3））/num_sample+lamda*thetaa1;

rb1=sum（（（hb_theta_x-yb）.*x1）.*fbz）;

thetab1=thetab1-alpha*rb1/num_sample+lamda*thetab1;

rb2=sum（（（hb_theta_x-yb）.*x2）.*fbz）;

thetab2=thetab2-alpha*rb2/num_sample+lamda*thetab2;

rb3=sum（（（hb_theta_x-yb）.*x3）.*fbz）;

thetab3=thetab3-alpha*rb3/num_sample+lamda*thetab3;

end

yant=thetaa0+thetaa1*x1+thetaa2*x2+thetaa3*x3;

ya=ya

yan=1./（1+exp（-yant））

ybnt=thetab0+thetab1*x1+thetab2*x2+thetab3*x3;

yb=yb

ybn=1./（1+exp（-ybnt））

plot（Jcostb）

2.带激活函数的单变量线性回归公式推导

（6）.单变量线性回归

我们能够给出单变量线性回归的模型：

此处的f（x）代表激活函数：

我们常称x为feature，h（x）为hypothesis；上述模型中的θ0和θ1在代码中分别用theta0和theta1表示。

从上面“方法”中，我们肯定有一个疑问，怎么样能够看出线性函数拟合的好不好呢？

我们需要使用到CostFunction（代价函数），代价函数越小，说明线性回归地越好（和训练集拟合地越好），当然最小就是0，即完全拟合。

costFunction的内部构造如下面公式所述：

其中：

表示向量x中的第i个元素；

表示向量y中的第i个元素；

表示已知的假设函数；

m为训练集的数量；

虽然给定一个函数，我们能够根据costfunction知道这个函数拟合的好不好，但是毕竟函数有这么多，总不可能一个一个试吧？

因此我们引出了梯度下降：

能够找出costfunction函数的最小值；

梯度下降原理：

将函数比作一座山，我们站在某个山坡上，往四周看，从哪个方向向下走一小步，能够下降的最快；当然解决问题的方法有很多，梯度下降只是其中一个，还有一种方法叫NormalEquation；

方法：

（1）先确定向下一步的步伐大小，我们称为Learningrate（alpha）；

（2）任意给定一个初始值：

（用theta0和theta1表示）；

（3）确定一个向下的方向，并向下走预先规定的步伐，并更新

；

（4）当下降的高度小于某个定义的值，则停止下降；

算法：

特点：

（1）初始点不同，获得的最小值也不同，因此梯度下降求得的只是局部最小值；

（2）越接近最小值时，下降速度越慢；

梯度下降能够求出一个函数的最小值；

线性回归需要使得costfunction的最小；

因此我们能够对costfunction运用梯度下降，即将梯度下降和线性回归进行整合，如下图所示：

上式中关于代价函数导数的公式推导过程如下：

从上面的推导中可以看出，要想满足梯度下降的条件，则

项后面必须乘以对应的输入信号

。

增加激活函数以后，要想满足梯度下降条件，则

项后必须增加激活函数的导数

。

另外

项后还要乘以输入信号的值。

否则，代价函数将无法收敛至期望的范围内。

梯度下降是通过不停的迭代，而我们比较关注迭代的次数，因为这关系到梯度下降的执行速度，为了减少迭代次数，因此引入了FeatureScaling。

如果在1.3部分的代码中漏掉f’（x）项或x（i）项，那么将代价函数将不会收敛。

（7）.多变量线性回归

我们能够给出单变量线性回归的模型：

此处的f（x）代表激活函数：

我们常称x为feature，h（x）为hypothesis；上述模型中的θ0和θ1在代码中分别用theta0和theta1表示。

从上面“方法”中，我们肯定有一个疑问，怎么样能够看出线性函数拟合的好不好呢？

我们需要使用到CostFunction（代价函数），代价函数越小，说明线性回归地越好（和训练集拟合地越好），当然最小就是0，即完全拟合。

costFunction的内部构造如下面公式所述：

其中：

表示向量x=[x1,x2]中的第i个元素；

表示向量y中的第i个元素；y是标签；

表示已知的假设函数；

m为训练集的数量；

关于代价函数导数的公式推导过程如下：

从上面的推导中可以看出，要想满足梯度下降的条件，则

项后面必须乘以对应的输入信号

。

增加激活函数以后，要想满足梯度下降条件，则

项后必须增加激活函数的导数

。

另外

项后还要乘以输入信号的值。

否则，代价函数将无法收敛至期望的范围内。

3.实现带激活函数的多输入单输出变量的回归计算

%triplevariableinputswithactivationfunction

%doubleoutputs

%normalizationofinputdata

clearall

clc

%trainingsampledata;

pa0=2;pa1=9;pa2=3;pa3=11;

pb0=3;pb1=1;pb2=2;pb3=6;

x1=[1612873111318273117];

x2=[379128492837912];

x3=[91792268412239226841];

x1_mean=mean（x1）

x1_max=max（x1）

x1_min=min（x1）

x1n=（x1-x1_mean）/（x1_max-x1_min）

x1=x1n;

x2_mean=mean（x2）

x2_max=max（x2）

x2_min

展开阅读全文