线性整流函数：修订间差异

删除的内容添加的内容

行内

2022年7月13日 (三) 16:19的最新版本

整流線性單位函数（Rectified Linear Unit, ReLU），又称修正线性单元，是一种人工神经网络中常用的激勵函数（activation function），通常指代以斜坡函数及其变种为代表的非线性函数。

比较常用的线性整流函数有斜坡函数 $f(x)=\max(0,x)$ ，以及带泄露整流函数（Leaky ReLU），其中 $x$ 为神经元（Neuron）的输入。线性整流被认为有一定的生物学原理^[1]，并且由于在实践中通常有着比其他常用激勵函数（譬如逻辑函数）更好的效果，而被如今的深度神经网络广泛使用于诸如图像识别等计算机视觉人工智能领域^[1]。

定义

通常意义下，线性整流函数指代数学中的斜坡函数，即

f(x)=\max(0,x)

而在神经网络中，线性整流作为神经元的激活函数，定义了该神经元在线性变换 $\mathbf {w} ^{T}\mathbf {x} +b$ 之后的非线性输出结果。换言之，对于进入神经元的来自上一层神经网络的输入向量 $x$ ，使用线性整流激活函数的神经元会输出

\max(0,\mathbf {w} ^{T}\mathbf {x} +b)

至下一层神经元或作为整个神经网络的输出（取决现神经元在网络结构中所处位置）。

变种

线性整流函数在基于斜坡函数的基础上有其他同样被广泛应用于深度学习的变种，譬如带泄露线性整流（Leaky ReLU）^[2]，带泄露随机线性整流（Randomized Leaky ReLU）^[3]，以及噪声线性整流（Noisy ReLU）^[4].

带泄露线性整流

在输入值 $x$ 为负的时候，带泄露线性整流函数（Leaky ReLU）的梯度为一个常数 $\lambda \in (0,1)$ ，而不是0。在输入值为正的时候，带泄露线性整流函数和普通斜坡函数保持一致。换言之，

f(x)={\begin{cases}x&{\mbox{if }}x>0\\\lambda x&{\mbox{if }}x\leq 0\end{cases}}

在深度学习中，如果设定 $\lambda$ 为一个可通过反向传播算法（Backpropagation）学习的变量，那么带泄露线性整流又被称为参数线性整流（Parametric ReLU）^[5]。

带泄露随机线性整流

带泄露随机线性整流（Randomized Leaky ReLU, RReLU）最早是在Kaggle全美数据科学大赛（NDSB）中被首先提出并使用的。相比于普通带泄露线性整流函数，带泄露随机线性整流在负输入值段的函数梯度 $\lambda$ 是一个取自连续性均匀分布 $U(l,u)$ 概率模型的随机变量，即

f(x)={\begin{cases}x&{\mbox{if }}x>0\\\lambda x&{\mbox{if }}x\leq 0\end{cases}}

其中 $\lambda \sim U(l,u),l<u$ 且 $l,u\in [0,1)$ 。

噪声线性整流

噪声线性整流（Noisy ReLU）是修正线性单元在考虑高斯噪声的基础上进行改进的变种激活函数。对于神经元的输入值 $x$ ，噪声线性整流加上了一定程度的正态分布的不确定性，即

f(x)=\max(0,x+Y)

其中随机变量 $Y\sim {\mathcal {N}}(0,\sigma (x))$ 。目前，噪声线性整流函数在受限玻尔兹曼机（Restricted Boltzmann Machine）在计算机图形学的应用中取得了比较好的成果^[4]。

优势

相比于传统的神经网络激活函数，诸如逻辑函数（Logistic sigmoid）和tanh等双曲函数，线性整流函数有着以下几方面的优势：

仿生物学原理：相关大脑方面的研究表明生物神經元的訊息编码通常是比较分散及稀疏的^[6]。通常情况下，大脑中在同一时间大概只有1%-4%的神经元处于活跃状态。使用線性修正以及正規化（regularization）可以对机器神经网络中神经元的活跃度（即输出为正值）进行调试；相比之下，逻辑函数在输入为0时達到 ${\frac {1}{2}}$ ，即已经是半饱和的稳定状态，不够符合实际生物学对模拟神经网络的期望^[1]。不过需要指出的是，一般情况下，在一个使用修正线性单元（即线性整流）的神经网络中大概有50%的神经元处于激活态^[1]。

更加有效率的梯度下降以及反向传播：避免了梯度爆炸和梯度消失问题

简化计算过程：没有了其他复杂激活函数中诸如指数函数的影响；同时活跃度的分散性使得神经网络整体计算成本下降

参考资料

^ ^1.0 ^1.1 ^1.2 ^1.3 Xavier Glorot, Antoine Bordes and Yoshua Bengio. Deep sparse rectifier neural networks (PDF). AISTATS. 2011 [2016-09-28]. （原始内容 (PDF)存档于2016-12-13）.
^ Andrew L. Maas, Awni Y. Hannum and Andrew Y. Ng. Rectified Nonlinearities Improve Neural Network Acoustic Models (PDF). ICML. 2013 [2019-07-29]. （原始内容 (PDF)存档于2021-01-10）.
^ Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolution Network. 2015. arXiv:1505.00853v2 . cite arXiv模板填写了不支持的参数 (帮助)
^ ^4.0 ^4.1 Vinod Nair and Geoffrey Hinton. Rectified linear units improve restricted Boltzmann machines (PDF). ICML. 2010. （原始内容 (PDF)存档于2014-03-24）.
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification. 2015. arXiv:1502.01852v1 . cite arXiv模板填写了不支持的参数 (帮助)
^ David Attwell and Simon B. Laughlin. An energy budget for signaling in the grey matter of the brain. JCBFM. 2001 [2016-09-28]. （原始内容存档于2016-09-08）.

外部链接

Quora: What is special about rectifier neural units used in NN learning?

[glorot2011-1] 1.0 ^1.1 ^1.2 ^1.3 Xavier Glorot, Antoine Bordes and Yoshua Bengio. Deep sparse rectifier neural networks (PDF). AISTATS. 2011 [2016-09-28]. （原始内容 (PDF)存档于2016-12-13）.

[leakyrelu-2] Andrew L. Maas, Awni Y. Hannum and Andrew Y. Ng. Rectified Nonlinearities Improve Neural Network Acoustic Models (PDF). ICML. 2013 [2019-07-29]. （原始内容 (PDF)存档于2021-01-10）.

[randomizedleakyrelu-3] Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolution Network. 2015. arXiv:1505.00853v2 . cite arXiv模板填写了不支持的参数 (帮助)

[nair2010-4] 4.0 ^4.1 Vinod Nair and Geoffrey Hinton. Rectified linear units improve restricted Boltzmann machines (PDF). ICML. 2010. （原始内容 (PDF)存档于2014-03-24）.

[parametricrelu-5] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification. 2015. arXiv:1502.01852v1 . cite arXiv模板填写了不支持的参数 (帮助)

[brainresearch-6] David Attwell and Simon B. Laughlin. An energy budget for signaling in the grey matter of the brain. JCBFM. 2001 [2016-09-28]. （原始内容存档于2016-09-08）.

[1]

[2]

[3]

[4]

[5]

[6]

@@ 第1行： / 第1行： @@
+{{NoteTA
+| G1 = IT
+}}
 {{ request translation }}
 [[Image:Ramp function.svg|整流線性單位函数|thumb|325px|right]]
 {{机器学习导航栏}}
-'''整流線性單位函数'''（Rectified Linear Unit, '''ReLU'''）,又称'''修正线性单元''', 是一种[[人工神经网络]]中常用的激勵函数（activation function），通常指代以[[斜坡函数]]及其变种为代表的非线性函数。
+'''整流線性單位函数'''（Rectified Linear Unit, '''ReLU'''），又称'''修正线性单元'''，是一种[[人工神经网络]]中常用的激勵函数（activation function），通常指代以[[斜坡函数]]及其变种为代表的非线性函数。
-比较常用的线性整流函数有[[斜坡函数]] <math>f(x) = \max(0, x)</math>，以及带泄露整流函数 (Leaky ReLU)，其中 <math>x</math> 为神经元(Neuron)的输入。线性整流被认为有一定的生物学原理<ref name="glorot2011">{{cite conference |authors=Xavier Glorot, Antoine Bordes and [[Yoshua Bengio]] |year=2011 |title=Deep sparse rectifier neural networks |conference=AISTATS |url=http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf}}</ref>，并且由于在实践中通常有着比其他常用激勵函数（譬如[[逻辑函数]]）更好的效果，而被如今的[[深度学习|深度神经网络]]广泛使用于诸如图像识别等[[计算机视觉]]<ref name="glorot2011"/>人工智能领域。
+比较常用的线性整流函数有[[斜坡函数]] <math>f(x) = \max(0, x)</math>，以及带泄露整流函数（Leaky ReLU），其中 <math>x</math> 为神经元（Neuron）的输入。线性整流被认为有一定的生物学原理<ref name="glorot2011">{{cite conference |authors=Xavier Glorot, Antoine Bordes and [[Yoshua Bengio]] |year=2011 |title=Deep sparse rectifier neural networks |conference=AISTATS |url=http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf |access-date=2016-09-28 |archive-date=2016-12-13 |archive-url=https://web.archive.org/web/20161213022121/http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf }}</ref>，并且由于在实践中通常有着比其他常用[[激勵函数]]（譬如[[逻辑函数]]）更好的效果，而被如今的[[深度学习|深度神经网络]]广泛使用于诸如图像识别等[[计算机视觉]]人工智能领域<ref name="glorot2011"/>。
 == 定义 ==
@@ 第19行： / 第22行： @@
 == 变种 ==
-线性整流函数在基于[[斜坡函数]]的基础上有其他同样被广泛应用于深度学习的变种，譬如带泄露线性整流(Leaky ReLU)<ref name="leakyrelu">{{cite conference|authors=Andrew L. Maas, Awni Y. Hannum and [[Andrew Ng | Andrew Y. Ng]]|year=2013|title=Rectified Nonlinearities Improve Neural Network Acoustic Models|conference=[[International Conference on Machine Learning|ICML]]|url=https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf}}</ref>， 带泄露随机线性整流(Randomized Leaky ReLU)<ref name="randomizedleakyrelu">{{cite arXiv |last1=Xu |first1=Bing |last2=Wang |first2=Naiyan |last3=Chen |first3=Tianqi |last4=Li |first4=Mu |date=2015 |title=Empirical Evaluation of Rectified Activations in Convolution Network |eprint=1505.00853v2 |url=https://arxiv.org/pdf/1505.00853.pdf}}</ref>，以及噪声线性整流(Noisy ReLU)<ref name="nair2010">{{cite conference |authors=Vinod Nair and [[Geoffrey Hinton]] |year=2010 |title=Rectified linear units improve restricted Boltzmann machines |conference=[[International Conference on Machine Learning|ICML]] |url=http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf |deadurl=yes |archiveurl=https://web.archive.org/web/20140324020659/http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf |archivedate=2014-03-24 }}</ref>.
+线性整流函数在基于[[斜坡函数]]的基础上有其他同样被广泛应用于深度学习的变种，譬如带泄露线性整流（Leaky ReLU）<ref name="leakyrelu">{{cite conference|authors=Andrew L. Maas, Awni Y. Hannum and [[Andrew Ng | Andrew Y. Ng]]|year=2013|title=Rectified Nonlinearities Improve Neural Network Acoustic Models|conference=[[International Conference on Machine Learning|ICML]]|url=https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf|access-date=2019-07-29|archive-date=2021-01-10|archive-url=https://web.archive.org/web/20210110222425/http://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf}}</ref>， 带泄露随机线性整流（Randomized Leaky ReLU）<ref name="randomizedleakyrelu">{{cite arXiv |last1=Xu |first1=Bing |last2=Wang |first2=Naiyan |last3=Chen |first3=Tianqi |last4=Li |first4=Mu |date=2015 |title=Empirical Evaluation of Rectified Activations in Convolution Network |eprint=1505.00853v2 |url=https://arxiv.org/pdf/1505.00853.pdf}}</ref>，以及噪声线性整流（Noisy ReLU）<ref name="nair2010">{{cite conference |authors=Vinod Nair and [[Geoffrey Hinton]] |year=2010 |title=Rectified linear units improve restricted Boltzmann machines |conference=[[International Conference on Machine Learning|ICML]] |url=http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf |deadurl=yes |archiveurl=https://web.archive.org/web/20140324020659/http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf |archivedate=2014-03-24 }}</ref>.
 === 带泄露线性整流 ===
@@ 第50行： / 第53行： @@
 == 优势 ==
 相比于传统的神经网络激活函数，诸如[[逻辑函数]]（Logistic sigmoid）和tanh等[[双曲函数]]，线性整流函数有着以下几方面的优势：
-*仿生物学原理：相关大脑方面的研究表明生物神經元的訊息编码通常是比较分散及稀疏的<ref name="brainresearch">{{cite conference |authors=David Attwell and Simon B. Laughlin |year=2001 |title=An energy budget for signaling in the grey matter of the brain |conference=JCBFM |url=http://jcb.sagepub.com/content/21/10/1133.long }}{{Dead link|date=2018年6月 |bot=InternetArchiveBot |fix-attempted=no }}</ref>。通常情况下，大脑中在同一时间大概只有1%-4%的神经元处于活跃状态。使用線性修正以及正規化（regularization）可以对机器神经网络中神经元的活跃度（即输出为正值）进行调试；相比之下，逻辑函数在输入为0时達到 <math>\frac{1}{2}</math>，即已经是半饱和的稳定状态，不够符合实际生物学对模拟神经网络的期望<ref name="glorot2011"/>。不过需要指出的是，一般情况下，在一个使用修正线性单元（即线性整流）的神经网络中大概有50%的神经元处于激活态<ref name="glorot2011"/>。
+*仿生物学原理：相关大脑方面的研究表明生物神經元的訊息编码通常是比较分散及稀疏的<ref name="brainresearch">{{cite conference |authors=David Attwell and Simon B. Laughlin |year=2001 |title=An energy budget for signaling in the grey matter of the brain |conference=JCBFM |url=http://jcb.sagepub.com/content/21/10/1133.long |access-date=2016-09-28 |archive-date=2016-09-08 |archive-url=https://web.archive.org/web/20160908012756/http://jcb.sagepub.com/content/21/10/1133.long }}</ref>。通常情况下，大脑中在同一时间大概只有1%-4%的神经元处于活跃状态。使用線性修正以及正規化（regularization）可以对机器神经网络中神经元的活跃度（即输出为正值）进行调试；相比之下，逻辑函数在输入为0时達到 <math>\frac{1}{2}</math>，即已经是半饱和的稳定状态，不够符合实际生物学对模拟神经网络的期望<ref name="glorot2011"/>。不过需要指出的是，一般情况下，在一个使用修正线性单元（即线性整流）的神经网络中大概有50%的神经元处于激活态<ref name="glorot2011"/>。
 *更加有效率的[[梯度下降法|梯度下降]]以及反向传播：避免了梯度爆炸和[[梯度消失问题|梯度消失]]问题