-
深度学习框架中卷积神经网络(CNN)的基本结构由输入层、卷积层、(下)采样层、全连接层、输出层构成,通常卷积层与池化层交替计算,中间层可以称作为隐含层(CNN网络并不区分全连接层与输出层)[9-11]。卷积神经网络的4个关键的步骤:权值共享、池化、局部连接以及多网络层的使用。卷积层的计算可以降低输入数据的噪声信息, 增强原始信息。池化是对卷积结果进行一个压缩处理,筛选特征,也就是下采样过程。全连接层为一个非线性特征映射过程来得到网络激活值。CNN网络的滤波器(feature map)用于提取输入数据的特征,每个feature map都会有许多神经元连接下一层,但并不是全连接,降低了参量的规模,体现稀疏性,这是卷积神经网络的一大特点[12-15]。卷积层的计算如下式所示:
$ x_j^i = f\left( {\sum\limits_{i \in {M_j}} {x_i^{l - 1}} * k_{ij}^l + b_j^l} \right) $
(1) 式中,Mj为输入的maps集合,xil-1采样与kijl权重做卷积,*表示卷积运算,bjl为偏移量,i和j表示单元个数,上标l和l-1表示网络层数,通过激活函数f进行逐层前向传输计算,可以得出网络的激活值与实际值的差[16]。前向传播过程如图 1所示。
图 1中, x表示输入层数据,上标表示网络层数,hW, b(x)为3层网络层处理后的数据,下标W为权值,b为偏置值,ai(j)表示第j层、第i个单元的激活函数,如下式描述:
$ \left\{ \begin{array}{l} a_1^{\left( 2 \right)} = f\left( {W_{11}^{\left( 1 \right)}{x_1} + W_{12}^{\left( 1 \right)}{x_2} + W_{13}^{\left( 1 \right)}{x_3} + b_1^{\left( 1 \right)}} \right)\\ a_2^{\left( 2 \right)} = f\left( {W_{21}^{\left( 1 \right)}{x_1} + W_{22}^{\left( 1 \right)}{x_2} + W_{23}^{\left( 1 \right)}{x_3} + b_2^{\left( 1 \right)}} \right)\\ a_3^{\left( 2 \right)} = f\left( {W_{31}^{\left( 1 \right)}{x_1} + W_{32}^{\left( 1 \right)}{x_2} + W_{33}^{\left( 1 \right)}{x_3} + b_3^{\left( 1 \right)}} \right) \end{array} \right. $
(2) $ \begin{array}{*{20}{c}} {{h_{W,b}}\left( x \right) = a_1^{\left( 3 \right)} = f\left( {W_{11}^{\left( 2 \right)}a_1^{\left( 2 \right)} + } \right.}\\ {\left. {W_{32}^{\left( 2 \right)}a_2^{\left( 2 \right)} + W_{33}^{\left( 2 \right)}a_3^{\left( 2 \right)} + b_1^{\left( 2 \right)}} \right)} \end{array} $
(3) 式中,bi(j),Wi(j)表示当前层对上一层的偏置和权重的映射,上标j显示为层数,hW, b(x)为Logistic回归。因此,前向传播可以总结为:
$ {z^{\left( {l + 1} \right)}} = {W^{\left( {l + 1} \right)}}{a^{\left( l \right)}} + {b^{\left( {l + 1} \right)}} $
(4) $ {a^{\left( {l + 1} \right)}} = f\left( {{z^{\left( {l + 1} \right)}}} \right) $
(5) 激活函数为relu函数,即f=max(0, x),本文中所用的激活函数并非传统的sigmoid函数,因为sigmoid函数两端饱和,在传输过程中易丢失信息,相比较relu函数则更容易学习优化。此非线性映射函数使深度神经网络具备了分层的非线性映射学习能力,使隐含层具有意义[17-20]。按极小化误差的方法反向传播调整权矩阵,就进入了反向传输过程。J(W, b)函数作为反向传播中的代价函数,实际上是对样本真实值与其测量值之间的误差的一个计算公式,单个代价函数为:
$ J\left( {W,b;x,y} \right) = \frac{1}{2}{\left\| {{h_{W,b}}\left( x \right) - y} \right\|^2} $
(6) 整体代价函数为:
$ \begin{array}{*{20}{c}} {J\left( {W,b} \right) = \left[ {\frac{1}{m}\sum\limits_{i = 1}^m {J\left( {W,b;{x^{\left( i \right)}},{y^{\left( i \right)}}} \right)} } \right] + }\\ {\frac{\lambda }{2}\sum\limits_{l = 1}^{nl - 1} {\sum\limits_{i = 1}^{sl} {\sum\limits_{j = 1}^{sl + 1} {{{\left( {W_{ij}^{\left( l \right)}} \right)}^2}} } } } \end{array} $
(7) 通过梯度下降算法来更新权值,每一次迭代对参量W和b进行反向更新:
$ W_{ij}^{\left( l \right)} = W_{ij}^{\left( l \right)} - a'\frac{\partial }{{\partial W_{ij}^{\left( l \right)}}}J\left( {W,b} \right) $
(8) $ b_i^{\left( l \right)} = b_i^{\left( l \right)} - a'\frac{\partial }{{\partial b_i^{\left( l \right)}}}J\left( {W,b} \right) $
(9) 式中,a′为学习速率,直至找到最佳的权值W,偏置值b使得代价函数的值达到最小,这就是权值更新的意义。其中最小代价函数的偏导数为:
$ \begin{array}{*{20}{c}} {\frac{\partial }{{\partial W_{ij}^{\left( l \right)}}}J\left( {W,b} \right) = }\\ {\left[ {\frac{1}{m}\sum\limits_{i = 1}^m {\frac{\partial }{{\partial W_{ij}^{\left( l \right)}}}J\left( {W,b;{x^{\left( i \right)}},{y^{\left( i \right)}}} \right)} } \right] + \lambda W_{ij}^{\left( i \right)}} \end{array} $
(10) $ \frac{\partial }{{\partial b_i^{\left( l \right)}}}J\left( {W,b} \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{\partial }{{\partial b_i^{\left( l \right)}}}J\left( {W,b;{x^{\left( i \right)}},{y^{\left( i \right)}}} \right)} $
(11) 式中,(x, y)为单个案例样本,W为网络连接节点权重,b为偏置值,Wij为权重值,λ为权重系数。
-
PCA提取最主要的信息来表达原始数据,去除冗余与噪声,简化复杂度并能最大程度地保留原始数据特征,其中特征值与能量的含义至关重要。具体步骤见下:
(1) 计算数据矩阵列向量的均值X,并将原训练数据矩阵X减去均值。
$ \mathit{\boldsymbol{\bar X}} = \frac{1}{m}\sum\limits_{i = 1}^m {{\mathit{\boldsymbol{X}}_i}} $
(12) $ {{\mathit{\boldsymbol{X'}}}_i} = {\mathit{\boldsymbol{X}}_i} - \mathit{\boldsymbol{\bar X}} $
(13) (2) 计算协方差矩阵CX。
$ {\mathit{\boldsymbol{C}}_\mathit{\boldsymbol{X}}} = \frac{1}{{m - 1}}\sum\limits_{i = 1}^m {\left( {{{\mathit{\boldsymbol{X'}}}_i},\mathit{\boldsymbol{X}}_i^{\rm{T}}} \right)} $
(14) (3) 计算协方差矩阵的特征值和特征向量。
(4) 从大到小的排列特征值,选择其中前k个特征值对应的特征向量作为投影特征矩阵(保证能量在97%以上),即为b′。
(5) 将训练样本X投影到上述特征向量上,得到最终的PCA降维矩阵,即为Y。
$ \mathit{\boldsymbol{Y}} = \mathit{\boldsymbol{b'}} \times \mathit{\boldsymbol{X}} $
(15) 随机投影,将原始的高维度数据使用一个列长相等的随机矩阵投影到一个低维度子空间中来达到简化原始数据的作用,是一种计算高效、高保真、高精度的降维方法,其实验结果是稀疏的[16]。首先,生成投影矩阵R,其过程如下:
(1) 构造一个独立同分布的、均值为零、正态的随机矩阵。
(2) 将步骤(1)产生的随机矩阵的行向量做标准正交化处理,得到投影矩阵R。
(3) 对投影矩阵R的行向量进行归一化处理。
接下来,将训练样本数据进行投影,得到随机投影降维矩阵,步骤为:
(1) 计算训练样本图像矩阵列向量的均值d行向量即为yi。
$ \mathit{\boldsymbol{\bar d}} = \frac{1}{m}\sum\limits_{i = 1}^m {{\mathit{\boldsymbol{y}}_i}} $
(16) (2) 步骤(1)中训练样本图像矩阵减去均值。
$ {{\mathit{\boldsymbol{y'}}}_i} = {\mathit{\boldsymbol{y}}_i} - \mathit{\boldsymbol{\bar d}} $
(17) (3) 步骤(2)矩阵与投影矩阵R相乘,得到随机投影降维矩阵S,如下式所示:
$ \mathit{\boldsymbol{S}} = \mathit{\boldsymbol{R}} \cdot {{\mathit{\boldsymbol{y'}}}_i} $
(18) 本文中采用PCA与RP来对图像进行加权联合降维,就是将两者的降维矩阵串行的组合在一起,当然也需要进行权值的选取。最后测试特征融合方法的检测效果。图 2为融合特征提取算法的流程。
图 2中x表示输入数据,y表示输出数据,图像联合降维的融合特征如式:
$ \left\{ \begin{array}{l} {\mathit{\boldsymbol{X}}_j} = \left[ {{\mathit{\boldsymbol{X}}_{{\rm{RP}}}},{\mathit{\boldsymbol{X}}_{{\rm{PCA}}}}} \right] = \left[ {m \cdot \mathit{\boldsymbol{P}}\left( \mathit{\boldsymbol{x}} \right),n \cdot \mathit{\boldsymbol{R}}\left( \mathit{\boldsymbol{x}} \right)} \right]\\ m + n = 1 \end{array} \right. $
(19) 式中,R(x)和P(x)分别为RP和PCA所对应的降维空间,m为PCA降维矩阵的加权系数,n为RP的加权系数,XRP和XPCA分别为RP和PCA降维矩阵加权后的样本,Xj为加权联合降维的图像特征向量。
-
实验条件为:计算机配置为64位操作系统,CPUi5-4200U处理器,内存为4GB,MATLAB版本为R2014a。数据库为MIT标准人脸图像库(图像总数为2000幅)、BioID数据库、自建库。其中MIT人脸库包含10个人不同表情与头部姿势,每人200幅。库图片如图 4所示。
-
在深度学习的神经网络研究中,参量的选择与确定很重要,如网络层数等。预处理过程中,即使是很小的差异也会产生不可忽略的影响。
参量确定实验在MIT标准人脸库、BioID人脸库、自建库(以自然场景图像为主)3种库上进行,图片像素尺寸归一化为115×115,网络结构为224-54-27-13-13-13-6。
-
为了使数据得到更好的降维投影矩阵基,在PCA运算过程中将高维矩阵投影到低维空间,使复杂问题简单化,则协方差矩阵的计算是关键,可以从中分析特征值与特征向量,以能量集中的特征值对应的特征向量构成图像降维特征,选取能量超过全部能量的97%的特征值来确定识别特征。
本文中选取能量超过全部能量的97%的特征值的特征向量来表征图像,对应的的特征值个数为30,PCA与RP的联合降维后的数据维度为30×115。表 1所示准确率最高能达到96.4%。在PCA与RP的联合降维的特征串行融合权值比为0.6:0.4及深度网络为17层时,特征值个数对算法识别准确率的关系,如表 1所示。特征值维数与识别准确率的关系曲线中有明显的两个拐点,分别为维数为24、30时图像识别率比较高,特别在特征值维数为30时,识别率能达到最高,如图 5所示。
Table 1. Determination of the dimension of eigenvalues
dimension 15 19 20 21 22 23 24 25 28 30 euclidean 89.9 88.9 89.9 94.4 94.4 94.4 92.9 94.4 94.3 93.9 cosine 90.4 90.4 90.4 94.4 94.9 94.4 93.9 93.9 94.5 96.4 correlation 86.9 90.4 88.9 92.4 94.6 92.4 92.9 92.9 94.3 94.9 mean 89.1 89.9 89.7 93.8 94.7 93.8 93.3 93.3 94.2 94.9 -
深度学习卷积神经网络结构层次决定学习特征的深度,在固定联合特征权值比0.6:0.4及特征值个数为30个的前提下研究深度学习网络结构对准确率的影响。实验结果如表 2所示。表中数据取100次试验的均值作为最终实验结果。从表 2中看出,网络为17层结构时,识别率最高,3种距离算法下识别率平均值为95.14%。
Table 2. Structure determination of deep learning
network structure(res) 14 15 16 17 18 19 20 21 22 euclidean 87.94 88.44 87.94 93.97 92.46 89.45 88.94 85.43 74.87 cosine 87.44 88.94 88.94 96.48 92.96 89.45 89.95 85.93 75.88 correlation 88.44 88.44 88.94 94.97 93.47 90.45 89.95 85.93 75.88 mean 87.94 88.61 88.61 95.14 92.80 89.95 89.61 85.76 75.54 从图 6中可以看出,在网络结构层数为17时,图像的检测率达到最高,高于或低于17时准确率呈下降趋势。说明网络层数为17时获取的特征具有很好的表达能力。
-
利用联合降维的思想,将PCA处理后的矩阵与经过RP投影的矩阵进行加权串联的方法实现融合。两者降维矩阵直接串联的特征识别效果不理想,如图 7所示。按0.6:0.4加权串联特征融合识别的结果明显好于直接串联的识别效果。
实验结果如表 3所示。在特征值个数为30、深度网络层数为17层的条件下得到的联合特征权值对准确率的影响。由图 8看出,随着PCA降维矩阵所占比重逐渐增加,识别率也呈现出逐渐上升的趋势,直至PCA矩阵与RP降维矩阵之比重达到6:4,识别率能达到最高,达到96%以上。
Table 3. Determination of feature fusion coefficient of PCA and RP
weight ratio 0.5:0.5 0.1:0.9 0.2:0.8 0.3:0.7 0.4:0.6 0.6:0.4 0.7:0.3 0.8:0.2 0.9:0.1 euclidean 85.43 87.44 93.79 94.47 94.47 93.97 94.47 92.96 92.46 cosine 86.43 86.93 92.96 94.47 94.47 96.48 95.48 94.97 94.47 correlation 84.42 86.43 92.96 94.47 94.47 94.97 94.97 94.47 93.97 mean 85.43 86.93 93.29 94.47 94.47 95.14 94.97 94.13 93.63 -
本文中设计了加权联合降维的特征融合深度识别算法来提取图像的深层特征。为了验证算法的有效性,设计3种特征提取算法与本文中方法的对比试验,分析识别率。方法1是采用PCA投影法提取特征和欧氏距离分类法; 方法2是采用随机投影RP法提取特征和欧氏距离分类法; 方法3是采用深度网络体特征和欧氏距离分类法; 方法4是本文中设计的加权融合特征的深度识别法与欧氏距离分类法。为了保证实验的准确性,采用相同的实验条件和数据库。
实验结果如表 4所示。表中数据取50次试验的均值作为最终实验结果。由表 4可知,本文中所设计的特征提取算法的准确率明显高于前3种方法的准确率。总结得出:方法1与方法2仍停留在浅层特征的挖掘,属于传统的手动提取特征; 方法3虽然采用了深度网络提取特征,但对于大尺寸的人脸图片,识别效果不佳; 而本文中所设计的加权特征融合算法有效处理大尺寸图片,通过多层隐含层的设计挖掘了数据深层、抽象的特征,识别率最高。
Table 4. Comparison results of feature extraction method
feature extraction method method 1 method 2 method 3 method 4 recognition rate 86% 89% 93% 97% -
加权特征融合算法提取特征后,采用3种近邻方法的分类器设计对比实验来分析分类器对识别率的影响。3种近邻方法为:欧氏距离法、夹角余弦距离法、相关距离法。
实验结果如表 5所示。表中数据取50次试验的均值作为最终实验结果。表 5所示夹角余弦距离分类法的分类结果更为稳定,识别率相对较高。
Table 5. Comparison results of distance method
classification method euclidean distance cosine distance correlation distance recognition rate 91.5% 93% 92% -
本文中通过BioID库、自建库两种图库来检验算法鲁棒性。BioID库为400张图片10类,自建库选取了650张23类(以物品图像为主)。数据库如图 8、图 9所示。
实验结果如表 6所示。表中数据取50次试验的均值作为最终实验结果。从表 6可知,本文中算法在两种图库上验证的识别率均能达到90%以上,尤其自建库的识别率能达到93%,表明本文中所设计算法在背景复杂与样品数目种类居多的情况下具有良好效果,说明了本文中算法具有良好适应性。
Table 6. Contrast results of three kinds of galleries
library name MIT library BioID library self-built library recognition rate 97% 90% 93%
加权联合降维的深度特征提取与分类识别算法
Deep feature extraction and classification recognition algorithm based on weighting and dimension reduction
-
摘要: 为了降低卷积神经网络计算的复杂度,改善特征提取过程中的过拟合现象,解决经典网络模型不能有效处理大尺寸图片的问题,采用了加权联合降维的特征融合与分类识别算法,根据两特征的识别贡献率对主成分分析法(PCA)降维处理和随机投影(RP)处理结果进行加权融合,然后将结果提供给卷积神经网络进行处理,提取图像分类的高层特征,使用欧氏距离分类器对识别对象进行分类,并进行了理论分析和实验验证。结果表明,经过加权联合降维对数据进行预处理,PCA矩阵与RP降维矩阵之比重达到6:4,识别率高达96%以上。该算法有效提高了准确率,使大尺寸图片在深度学习网络中有良好的识别效果,改善了网络的适应性。Abstract: In order to reduce the computational complexity of convolution neural network, improve the over-fitting phenomenon in the process of feature extraction and solve the problem that the classic network model can not effectively deal with large size images, deep feature extraction and classification recognition algorithm based on weighting and dimension reduction was adopted. Based on recognition contribution rate of two features, the results of dimensionality reduction of principal component analysis (PCA) and random projection (RP) method were fused with weighted average, then the results were provided to convolution neural network and the high-level features of image classification were extracted. Euclidean distance classifier was used to classify the recognition objects. After theoretical analysis and experimental verification, the results show that the weight ratio of PCA matrix and RP reduction matrix is 6:4, and the recognition rate is over 96% after the preprocess of data by weighting and dimension reduction. This algorithm improves the accuracy effectively, makes large size pictures having good recognition effect in deep learning network and improves the adaptability of network.
-
Table 1. Determination of the dimension of eigenvalues
dimension 15 19 20 21 22 23 24 25 28 30 euclidean 89.9 88.9 89.9 94.4 94.4 94.4 92.9 94.4 94.3 93.9 cosine 90.4 90.4 90.4 94.4 94.9 94.4 93.9 93.9 94.5 96.4 correlation 86.9 90.4 88.9 92.4 94.6 92.4 92.9 92.9 94.3 94.9 mean 89.1 89.9 89.7 93.8 94.7 93.8 93.3 93.3 94.2 94.9 Table 2. Structure determination of deep learning
network structure(res) 14 15 16 17 18 19 20 21 22 euclidean 87.94 88.44 87.94 93.97 92.46 89.45 88.94 85.43 74.87 cosine 87.44 88.94 88.94 96.48 92.96 89.45 89.95 85.93 75.88 correlation 88.44 88.44 88.94 94.97 93.47 90.45 89.95 85.93 75.88 mean 87.94 88.61 88.61 95.14 92.80 89.95 89.61 85.76 75.54 Table 3. Determination of feature fusion coefficient of PCA and RP
weight ratio 0.5:0.5 0.1:0.9 0.2:0.8 0.3:0.7 0.4:0.6 0.6:0.4 0.7:0.3 0.8:0.2 0.9:0.1 euclidean 85.43 87.44 93.79 94.47 94.47 93.97 94.47 92.96 92.46 cosine 86.43 86.93 92.96 94.47 94.47 96.48 95.48 94.97 94.47 correlation 84.42 86.43 92.96 94.47 94.47 94.97 94.97 94.47 93.97 mean 85.43 86.93 93.29 94.47 94.47 95.14 94.97 94.13 93.63 Table 4. Comparison results of feature extraction method
feature extraction method method 1 method 2 method 3 method 4 recognition rate 86% 89% 93% 97% Table 5. Comparison results of distance method
classification method euclidean distance cosine distance correlation distance recognition rate 91.5% 93% 92% Table 6. Contrast results of three kinds of galleries
library name MIT library BioID library self-built library recognition rate 97% 90% 93% -
[1] SUN Y, CHEN Y H, WANG X G, et al. Deep learning face repersentation by joint identification-verification[J]. Advance in Neural Information Processing Systems, 2014, 27(12):30-60. [2] LLORCA D F, ARROYO R, SOTELO M A. Vehicle logo recognition in traffic images HOG features and SVM[C]//2013 16th International IEEE Conference on Intelligent transportation Systems: Intelligent Transportation Systems for All M-odes(ITSC 2013).New York, USA: IEEE, 2014: 2229-2234. [3] QIAN F. Face recognition based on PCA[D]. Nanjing: Southeast University, 2003: 49-51(in Chinese). [4] XU P, FU H. Facial expression recognition based on convolutional neural network, [J]. Artificial Intelligence, 2015, 34(12):45-47(in Chinese). [5] XU F J, WU W, GONG Y, et al. Tracking using convolutional neural networks[J]. IEEE Transcations on Neural Networks, 2010, 21(10):1610-1623. doi: 10.1109/TNN.2010.2066286 [6] BELHUMEUR P N, HESPANHA J P, KRIEGMAN D J. Recognition using class specific linear projection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(7):711-720. doi: 10.1109/34.598228 [7] ZHANG C, ZHANG Z. Improving multiview face detection with multi-task deep convolutional neural networks[C]//2014 IEEE Winter Conference on Application of Computer Vision(WACV). New York, USA: IEEE, 2014: 1036-1041. [8] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems. City of North Miami Beach, Florida, USA: Curran Associates Inc., 2012: 1097-1105. [9] BENGIO Y. Learning deep arcjitectures for AI[J]. Foundations & Trends® in Machine Learning, 2009, 2(1):1-127. [10] LIN Y M. face recognition based on deep learning[D]. Dalian: Dalian University of Technology, 2013: 14-26(in Chinese). [11] SUN Y, WANG X G, TANG X. Deep learning face representation form predicting 10, 000 classes[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2014: 1891-1898. [12] ZHENG Y, CHEN Q Q. Deep lesrning and its new progress in object and behavior recogntion[J]. Journal of Image and Graphic, 2014, 19(2):175-184. [13] CHEN Ch. Research and implementation of face detection algorithm based on depth learning[D]. Chengdu: University of Electronic Science and Technology, 2017: 49-80(in Chinese). [14] LANGKVISE M, KARLSSON L, LOU T A. Areview of unsuper vised feature learning and deep learning for time series modeling[J]. Pattern Recognition Letter, 2014, 42(5):11-24. [15] XIONG Y, ZUO X Q, HUANG L, et al. Classification of color remote sensing images based on multi-feature combination[J]. Laser Technology, 2014, 38(2):165-171(in Chinese). [16] LIU B. Infrared face recognition method based on random projection and sparse representation[D]. Xi'an: Xi'an Electronic and Science University, 2009: 2-9(in Chinese). [17] SUN J G, MENG F Y. A weighted weighted fusion face recognition algorithm[J]. Journal of Intelligent Systems, 2015, 12(7):4-7(in Chinese). [18] ZHANG B, LIU J F, TANG X L. Multi-scale video text detection based on corner and stroke width verification[C]//Visual Communications and Image Processing (VCIP), 2013. New York, USA: IEEE, 2014: 1-6. [19] ZOU G F, FU G X. Multi pose face recognition based on weighted mean face[J]. Computer Application Research, 2017, 11(7):1-7(in Chinese). [20] CHAN T H, MA Y. A simple deep learning baseline for image classification[J]. IEEE Transactions on Image Processing, 2015, 11(4):10-17.