-
作者前期所提出的ERVQ[19]是由多层码书构成,逐级量化的量化模型。所有层的量化输出向量通过累加形成输入特征向量的重构向量。除去第1层和最后一层,利用每层码书进行量化时产生的量化误差作为下一层的量化输入,旨在对特征向量应用多层量化尽可能地减少特征向量的近似表示误差。ERVQ由码书训练和量化特征向量两个部分构成。
-
为了使训练得到的多层码书能更准确地对输入特征向量近似表示,ERVQ利用k均值算法训练得到每一层初始码书后,通过总体量化误差目标函数对初始码书进行迭代优化。其中,采用RVQ[17]来训练进行迭代优化所需的初始码书;在此基础上,对每一层码书按顺序进行迭代优化,不断更新码书。
一次迭代过程包括所有层码书的顺序优化。对于某一层码书,其优化的具体实现思路是基于其它层的码书,把该层码书当作最后一层码书进行重新训练,并将新的码书代替原来的码书以及更新训练集的量化输出[22]。当目标函数收敛到一定程度时,迭代优化过程停止。
-
给定特征向量v,图 1是一个两层ERVQ的逐层量化过程示意图。
(1) 根据下式,将第1层(i=1)码书C1中和v之间欧氏距离d(v, ci, j)最近的聚类中心对应的向量作为第1层量化输出${\mathit{\boldsymbol{\hat v}}_1}$:
$ {{\mathit{\boldsymbol{\hat v}}}_1} = \arg \mathop {\min }\limits_{{\mathit{\boldsymbol{c}}_{i, j}}} d(\mathit{\boldsymbol{v, }}{\mathit{\boldsymbol{c}}_{i, j}}), (j = 1, 2, \ldots , k;{\mathit{\boldsymbol{c}}_{i, j}} \in {\mathit{\boldsymbol{C}}_i}) $
(1) 式中,i表示当前层码书的层号;C1为第1层码书的k个聚类中心对应的集合,k为每层码书的聚类中心数;ci, j为第i层码书中第j个聚类中心。
(2) 根据下式,计算v的第1层量化误差对应的误差向量e1:
$ {\boldsymbol{e}_1} = \mathit{\boldsymbol{v}} - {{\mathit{\boldsymbol{\hat v}}}_1} $
(2) (3) 根据(1)式,获得v的第2层(i=2)量化输出${\mathit{\boldsymbol{\hat v}}_1}$。
对于层数大于2的ERVQ,需要继续将第2层的量化误差输入到第3层进行量化, 直到最后一层。
-
为了评估投影增强型残差量化对特征向量的近似表示精度,类似于参考文献[19],设计近似欧氏距离计算方法并用于完全检索。
给定特征向量v,根据经过投影增强型残差量化得到的量化输出,可利用下式对其进行重构得到近似向量${\mathit{\boldsymbol{\hat v}}_1}$:
$ \mathit{\boldsymbol{\hat v}} = \mathit{\boldsymbol{\mu }} + \sum\limits_{l = 1}^L {(\mathit{\boldsymbol{M}}_l^T·{{\mathit{\boldsymbol{\hat v}}}_1})} = \\\mathit{\boldsymbol{\mu }} + \sum\limits_{l = 1}^L {(\mathit{\boldsymbol{M}}_l^T·{\mathit{\boldsymbol{c}}_{v, l}})} = \mathit{\boldsymbol{\mu }} + \mathit{\boldsymbol{\tilde v}} $
(3) 式中, μ为应用PCA之前用于调整中心的均值向量,cv, l∈Cl, cv, l(D×1向量)为v(d×1向量)在第l层码书Cl的量化输出,D为投影降维后的向量维度,d为向量的原始维度,L为投影增强型残差量化的码书总层数,MlT(d×D投影矩阵)为第l层的逆投影矩阵。
因此,给定查询特征向量q,其与库中特征向量v之间欧氏距离转换成计算q与v的近似表示${\mathit{\boldsymbol{\hat v}}_1}$的欧氏距离,计算公式如下所示:
$ \begin{array}{c} d{(\mathit{\boldsymbol{q, v}})^2} \approx {\left\| {\mathit{\boldsymbol{q}} - \mathit{\boldsymbol{\hat v}}} \right\|^2} = {\left\| {\mathit{\boldsymbol{\tilde q}} - \mathit{\boldsymbol{\tilde v}}} \right\|^2} = \\ {\left\| {\mathit{\boldsymbol{\tilde q}}} \right\|^2} + {\left\| {\mathit{\boldsymbol{\tilde v}}} \right\|^2} - 2\left\langle {\mathit{\boldsymbol{\tilde q}}, \mathit{\boldsymbol{\tilde v}}} \right\rangle = \\ {\left\| {\mathit{\boldsymbol{\tilde q}}} \right\|^2} + {\left\| {\mathit{\boldsymbol{\tilde v}}} \right\|^2} - 2\left\langle {\mathit{\boldsymbol{\tilde q}}, \sum\limits_{l = 1}^L {(\mathit{\boldsymbol{M}}_l^T{\mathit{\boldsymbol{c}}_{v, l}})} } \right\rangle = \\ {\left\| {\mathit{\boldsymbol{\tilde q}}} \right\|^2} + {\left\| {\mathit{\boldsymbol{\tilde v}}} \right\|^2} - 2\sum\limits_{l = 1}^L {\left\langle {\mathit{\boldsymbol{\tilde q}}, \mathit{\boldsymbol{M}}_l^T{\mathit{\boldsymbol{c}}_{v, l}}} \right\rangle } = \\ {\left\| {\mathit{\boldsymbol{\tilde q}}} \right\|^2} + {\left\| {\mathit{\boldsymbol{\tilde v}}} \right\|^2} - 2\sum\limits_{l = 1}^L {\left\langle {{\mathit{\boldsymbol{M}}_l}\mathit{\boldsymbol{\tilde q}}, {\mathit{\boldsymbol{c}}_{v, l}}} \right\rangle } = \\ {\left\| {\mathit{\boldsymbol{\tilde q}}} \right\|^2} + {\left\| {\mathit{\boldsymbol{\tilde v}}} \right\|^2} - 2\sum\limits_{l = 1}^L ( {\mathit{\boldsymbol{M}}_l}\mathit{\boldsymbol{\tilde q}}{)^{\rm{T}}}{\mathit{\boldsymbol{c}}_{v, l}} \end{array} $
(4) 式中,〈·〉为向量的点积运算,$\mathit{\boldsymbol{\tilde q}} = \mathit{\boldsymbol{q}} - \mathit{\boldsymbol{\mu }}, \mathit{\boldsymbol{\tilde v}} = \mathit{\boldsymbol{\hat v}} - \mathit{\boldsymbol{\mu }} $其中μ为应用PCA之前用于调整中心的均值向量。第1项${\left\| {\mathit{\boldsymbol{\tilde q}}} \right\|^2} $对数据库中所有向量影响相同,不会影响数据库向量与查询的相似度排名,在距离计算中可以忽略掉;第2项${\left\| {\mathit{\boldsymbol{\tilde v}}} \right\|^2} $可以在数据库向量量化编码过程离线计算得到并预先存储;第3项$ {({\mathit{\boldsymbol{M}}_l}·\mathit{\boldsymbol{\tilde q}})^{\rm{T}}}{\mathit{\boldsymbol{c}}_{v, l}}$,当提交查询向量q时可提前计算并储存在一个查找表中。
-
本文中将在GIST和VLAD两个公开数据集[23]上评估应用了投影增强型残差量化在码书训练和完全检索方面的性能。如表 1所示,GIST和VLAD两个数据集包含3个子集,其中,训练集用于训练码书;数据库集是用于获取从中检索同查询点近似的特征向量;查询集用于对检索算法进行性能评估。
Table 1. Information of experimental datasets
datesets GIST VLAD dimension of vectors 960 4096 size of database set 1000000 1295000 size of training set 500000 200000 size of query set 1000 849 所有实验都是在一台Intel Core i5 2.8GHZ CPU, 32G内存的PC,MATLAB 2011环境下完成的。
-
图 4和图 5反映了GIST和VLAD数据集上投影维度D∈{23, 24, 25, 26, 27, 28, 29}对总体量化误差的影响。实验参量设置中,投影增强型残差量化的每层码书中聚类中心数量k固定为256,码书层数为L∈{4, 8, 16}。
如图 4和图 5所示,当码书层数固定时(如L=8),随着PCA投影的维度从512维~8维的变化,总体误差呈现出先减少后增加的曲线变化。观察发现,当投影维度D=128维时,GIST和VLAD数据集上总体量化误差最小。
-
图 6和图 7反映了GIST和VLAD数据集上投影维度D∈{23, 24, 25, 26, 27, 28, 29}对检索精度(用召回率Rrecall@n(n=100)表示,其中n表示检索返回的结果数量)的影响。实验参量设置中,投影增强型残差量化的每层码书中聚类中心数量k固定为256,码书层数为L∈{4, 8, 16}。
如图 6和图 7所示,从投影维度512维开始,检索精度先是随着维度降低而不断提升直到投影维度为128维,随后检索精度随着投影维度的降低而不断降低。对比图 6和图 7,参量D的变化对VLAD数据集上检索精度的影响比GIST更大。
-
针对最近邻搜索的性能对比,将投影增强型残差量化(projected enhanced residual vector quantization, PERVQ)与其它4种方法(见表 2)从检索精度和检索效率(检索时间)两个方面进行比较。在表 2所示4种方法的相关文献[4, 17-19]中,依据参考文献中实验数据,综合检索速度和检索精度平衡,都是采用8个子码书, 并且每个子码书都是由256个聚类中心构成。
Table 2. Description of compared methods
methods description PQ[4] composed by 8 sub-quantizers in which each codebook is consisted of 256 centers RVQ[17] composed by L(L=8) level sub-quantizers in which each codebook is consisted of 256 centers ERVQ[19] composed by L(L=8) level sub-quantizers in which each codebook is consisted of 256 centers orthogonal RVQ[18] composed by L(L=8) level sub-quantizers in which each codebook is consisted of 256 centers 所有方法均采用相同的编码长度,检索精度用召回率Rrecall@n指标进行评估。
为公平起见,PERVQ采用同PQ,RVQ,ERVQ和正交RVQ相同的码书参量,即L=8层码书构成的8级子量化器,并且每层码书的码书规模设置为256。
-
PERVQ建立在ERVQ的基础上,解决ERVQ在原始高维向量空间训练码书带来的码书训练时间效率受特征向量维度制约的问题,旨在提升码书的训练效率的同时,综合投影误差和量化误差对低维向量空间的码书进行优化。
图 8为GIST数据集上RVQ, ERVQ以及投影到各个向量维度D∈{23, 24, 25, 26, 27, 28, 29}上,训练8层码书对应码书所花费的时间。相比RVQ,由于ERVQ和PERVQ都增加了码书优化步骤用于提升码书精度,因而额外增加了优化码书阶段的时间开销。从图 8可以观察到,将PCA和ERVQ结合后,PERVQ训练码书花费的时间要明显少于ERVQ,此外,当D∈{23, 24, 25}时,PERVQ训练码书所花费的时间比RVQ少。
-
从图 6可以观察到,在GIST数据集上,当L=8,各层码书规模为256个聚类中心,PERVQ在D=128时,具有最优的近似最近邻完全检索精度。图 9为在GIST数据集上不同方法的检索精度。PERVQ采用8层码书,每层码书初始聚类中心数为256。从图 9可以看出,PQ方法检索性能明显不如其它方法,RVQ和ERVQ由于考虑了量化误差,在相同编码长度下,其检索精度有了较大提升,优于积量化。PERVQ不仅考虑了总体量化误差,也考虑了投影误差,检索精度较残差量化方法好于RVQ, ERVQ和PQ。PERVQ同orthogonal RVQ具有相当的检索精度,但orthogonal RVQ是在原始高维向量空间进行多层码书训练和优化。
-
在GIST数据集上对不同方法的检索时间进行了比较,所有方法的参量设置同上,检索时间如表 3所示。
Table 3. Comparison of search time on GIST
methods search time/ms Rrecall@100 PQ 33.6 0.32 RVQ 35.0 0.68 ERVQ 34.3 0.70 PERVQ D=8 31.1 0.70 PERVQ D=16 32.2 0.70 PERVQ D=32 33.8 0.71 PERVQ D=128 44.7 0.72 orthogonal RVQ 34.2 0.73 PERVQ, ERVQ, RVQ和orthogonal RVQ在应用完全检索时都是利用向量的近似表示来计算查询向量和库向量之间的近似欧氏距离。不同之处在于ERVQ, RVQ以及orthogonal RVQ的查找表构造是在高维空间进行,而PERVQ的查找表是在低维空间进行构造,因而PERVQ生成查找表比ERVQ和RVQ具有更快的速度,但是PERVQ将查询向量投影到低维空间相比RVQ和ERVQ需要花费额外的投影时间。因此当构造查找表对效率的提升程度大于向量投影对时间效率影响程度时,PERVQ比ERVQ, RVQ以及orthogonal RVQ具有更好的检索速度。
由表 3可见,PERVQ的近似最近邻完全检索在投影维度D∈{23, 24, 25}时,其检索时间花费少于PQ, RVQ和ERVQ。结合图 6和图 9,当D∈{23, 24, 25},PERVQ仍具有同ERVQ和orthogonal RVQ相当的检索精度。
近似最近邻搜索中投影增强型残差量化方法
Projection-based enhanced residual quantization for approximate nearest neighbor search
-
摘要: 为了降低图像特征向量量化的近似表示和高维向量带来的码书训练时间开销,提出了一种投影增强型残差量化方法。在前期的增强型残差量化工作基础上,将主成分分析与增强型残差量化相结合,使得码书训练和特征量化均在低维向量空间进行以提高效率;在低维向量空间上训练码书过程中,提出了联合优化方法,同时考虑投影和量化产生的总体误差,提升码书精度;针对该量化方法,设计了一种特征向量之间的近似欧氏距离快速计算方法用于近似最近邻完全检索。结果表明,相比增强型残差量化,在相同检索精度前提条件下,投影增强型残差量化的只需花费近1/3的训练时间;相比其它同类方法,所提出方法在码书训练时间效率、检索速度和精度上均具有更优的综合性能。该研究为主成分分析同其它量化模型的有效结合提供了参考。Abstract: To reduce the time costs on approximating vector quantization of image features and training codebooks for high-dimensional vectors, a projection-based enhanced residual vector quantization was proposed. Based on previous research on enhance residual quantization (ERVQ), the principle component analysis (PCA) was combined with ERVQ, then both training codebooks and quantizing feature vectors were done in low-dimensional vector space to improve the efficiency. The features for training codebooks were projected into low-dimensional vector spaces. The overall errors generated by projection and quantization were both considered in training procedure to increase the codebook discrimination. For the proposed quantization method, a method to fast computing the approximate Euclidian distance between vectors was designed to retrieve approximate nearest neighbor exhaustively. The experimental results show that the proposed approach only takes near 1/3 training time compared to ERVQ on the condition of same retrieval accuracy. Meanwhile, the proposed approach outperforms the other state-of-the-arts over time efficiency on training codebooks, retrieval accuracy and efficiency. This study provides a reference for the effective combination of PCA with other quantization models.
-
Key words:
- image processing /
- vector quantization /
- approximate nearest neighbor /
- image retrieval
-
Table 1. Information of experimental datasets
datesets GIST VLAD dimension of vectors 960 4096 size of database set 1000000 1295000 size of training set 500000 200000 size of query set 1000 849 Table 2. Description of compared methods
methods description PQ[4] composed by 8 sub-quantizers in which each codebook is consisted of 256 centers RVQ[17] composed by L(L=8) level sub-quantizers in which each codebook is consisted of 256 centers ERVQ[19] composed by L(L=8) level sub-quantizers in which each codebook is consisted of 256 centers orthogonal RVQ[18] composed by L(L=8) level sub-quantizers in which each codebook is consisted of 256 centers Table 3. Comparison of search time on GIST
methods search time/ms Rrecall@100 PQ 33.6 0.32 RVQ 35.0 0.68 ERVQ 34.3 0.70 PERVQ D=8 31.1 0.70 PERVQ D=16 32.2 0.70 PERVQ D=32 33.8 0.71 PERVQ D=128 44.7 0.72 orthogonal RVQ 34.2 0.73 -
[1] AI L, YU J, HE Y, et al. High-dimensional indexing technologies for large scale content-based image retrieval: A review [J]. Journal of Zhejiang University(Svirnvr C), 2013, 14(7): 505-520. [2] LIU D Y, WANG G J, WU J, et al. Light field image compression method based on correlation of rendered views [J]. Laser Technology, 2019, 43(4): 551-556 (in Chinese). [3] WU Z, YU J. Vector quantization: A review [J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20(4):507-524. [4] HERVE J, MATTHIJS D, CORDELIA S. Product quantization for nearest neighbor search [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117-128. doi: 10.1109/TPAMI.2010.57 [5] MATSUI Y, UCHIDA Y, HERVE J. A survey of product quantization [J]. ITE Transactions on Media Technology and Applications, 2018, 6(1):2-10. doi: 10.3169/mta.6.2 [6] GE T, HE K, KE Q, et al. Optimized product quantization [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(4):744-755. doi: 10.1109/TPAMI.2013.240 [7] KALANTIDIS Y, AVRITHIS Y. Locally optimized product quantization for approximate nearest neighbor search[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA: IEEE, 2014: 1213-1220. [8] HEO J, LIN Z, YOON S. Distance encoded product quantization[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA: IEEE, 2014: 1241-1248. [9] NOROUZI M, FLEET D. Cartesian k-means[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA: IEEE, 2013: 3017-3024. [10] BABENKO A, LEMPITSKY V. Improving bilayer product quantization for billion-scale approximate nearest neighbors in high dimensions [EB/OL].[2019-10-10].https: //arxiv.org/abs/1404.1831. [11] KALANTIDIS Y, AVRITHIS Y. Locally optimized product quantization for approximate nearest neighbor search[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA: IEEE, 2014: 2329-2336. [12] JOHNSON J, DOUZE M, HERVE J. Billion-scale similarity search with gpus [J/OL].[2019-11-20].https: //ieeexplore.ieee.org/document/8733051. [13] MATSUI Y, OGAKI K, YAMASAKI T, et al. PQ k-means: Billionscale clustering for product-quantized code[C]// ACM Conference on Multimedia (MM). California, USA: ACM, 2017: 1-9. [14] JONATHAN B. Transform coding for fast approximate nearest neighbor search in high dimensions[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA: IEEE, 2010: 1815-1822. [15] OZAN E, KIRANYAZ A, GABBOUJ M. K-subspaces quantization for approximate nearest neighbor search [J]. IEEE TKDE, 2016, 28(7): 1722-1733. [16] BABENKO A, LEMPITSKY V. Tree quantization for large-scale similarity search and classification[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, USA: IEEE, 2015: 4240-4248. [17] CHEN Y, GUAN T, WANG C. Approximate nearest neighbor search by residual vector quantization[J]. Sensors, 2010, 10(12): 11259-11273. doi: 10.3390/s101211259 [18] WANG J, ZHANG T. Composite quantization [J]. IEEE Transactions on PAMI, 2019. 41(6): 1308-1322. [19] AI L, YU J, WU Z, et al. Optimized residual vector quantization for efficient approximate nearest neighbor search [J]. Multimedia Systems, 2017, 23(2): 169-181. [20] LIU S, SHAO J, LU H. Generalized residual vector quantization and aggregating tree for large scale search [J]. IEEE Transactions on Multimedia, 2017, 19(8): 1785-1797. doi: 10.1109/TMM.2017.2692181 [21] WEI B, GUAN T, YU J. Projected residual vector quantization for ANN search [J]. IEEE Multimedia, 2014, 21(3):41-51. doi: 10.1109/MMUL.2013.65 [22] AI L F, LIU K, WU J. Enhanced residual vector quantization-based non-exhaustive retrieval for image visual features [J]. Journal of Hefei University, 2016, 26(1): 46-51(in Chinese). [23] GUAN T, HE Y, GAO J, et al. On-device mobile visual location recognition by integrating vision and inertial sensors [J]. IEEE Transactions on Multimedia, 2013, 15(7):1688-1699. doi: 10.1109/TMM.2013.2265674