-
RANSAC算法最早由FISCHLER和BOLLES于1981年提出[9],作为一种迭代方法,用来在一组包含离群的被观测数据中估算出数学模型的参量。其主要原理是:输入一组实验数据,通过迭代反复选择数据中的一组随机子集(局内点),排除噪声(局外点),给出一个模型,最大概率的适用于局内点, 算法流程见图 1[10-11]。
RANSAC算法计算过程中,要确定的参量有判断为内点的阈值t、使得模型足够合理的内点数目d,被重复执行上述流程的次数k(迭代次数)可以从理论结果推断出。从估计模型参量时,用p表示在迭代过程中从数据集内随机选取出的点均为局内点的概率,用w表示每次从数据集中选取一个局内点的概率:w=局内点的数目/数据集的数目。
假设估计模型需要选定n个点,wn是所有n个点均为局内点的概率;(1-wn)是n个点中至少有一个点为局外点的概率,此时表明从数据集中估计出了一个不好的模型。(1-wn)k表示算法永远都不会选择到n个点均为局内点的概率,它和(1-p)相同,即:1-p=(1-wn)k。两边取对数,即得到迭代次数:
$ k = \frac{{{\rm{lg}}(1 - p)}}{{{\rm{lg}}(1 - {w^n})}} $
(1) 阈值t的选取很重要,直接影响内点外点的判断[12]。因为在判断有效点的时候,若选取的t较小,则会放弃应该选择的有效点;而选取的t较大,则可能将异常点或误差点误判为有效点。针对该问题,本文中采用绝对中位差(median absolute deviation, MAD)DMAD来估计数据的方差。假设选取的数据子集为yi,则其表达式为:
$ {D_{{\rm{MAD}}}} = {\rm{media}}{{\rm{n}}_i}(|{y_i} - {\rm{media}}{{\rm{n}}_j}({y_i})|) $
(2) 式中,median为求数组的中值函数,∣·∣为求绝对值符号,i和j分别为数据子集位置。阈值t取实验数据的绝对中位差,再用模型去测试其它实验数据,若数据点到直线的距离小于t时,此点被认为是内点,反之则为外点。
-
为验证两种算法的可靠性和稳健性,对同一组含有相同误差和异常点的数据进行模拟,当数据(inliers)中不添加异常点和添加50个异常点(outliers)时,LSM和RANSAC算法的拟合结果如图 2所示。
结果显示, 最小二乘法对异常值较敏感,当实验数据中出现异常点时,最小二乘法拟合的直线极大地偏离原直线;而RANSAC则可有效地排除异常点的影响,拟合结果非常接近原模型,具有较好的稳健性。相比于最小二乘法,RANSAC算法在计算参量的迭代次数没有上限,其优点是它能鲁棒性地估计模型参量,即使是对于存在一定显著数量的异常值的数据集,也可以高精度的估计参量,因而被广泛地应用于图像处理。实际应用中,通过最佳化RANSAC模型参量可找到最大内点集[13-15],减小误差概况概率,提高数据处理的精确度。
-
本文中首先通过采用Python程序语言自行编程对波长调制二次谐波信号进行仿真研究。假设不含噪声的体积分数为10-7的甲醛二次谐波信号为参考信号X(信号幅值为相对值),体积分数为2×10-7的二次谐波信号为待分析信号Y,并通过对待分析信号Y添加不同幅值A的噪声(部分信号如图 3a~图 3c所示),进而对以上两种线性拟合模型进行评估。以图 3a~图 3c仿真二次谐波信号(横坐标为采样点数,无单位)中每个X信号为横坐标,Y信号为纵坐标,画出的图形及相应线性拟合结果分别展示在对应的图 3d~图 3f中,最终拟合结果统计如表 1所示。从表 1可以看出,对具有较高信噪比(signal-to-noise ratio, SNR)RSNR的谐波信号进行线性拟合,两种算法的拟合结果具有很好的一致性;随着噪声的增加,对具有较低信噪比的二次谐波信号进行线性拟合时,RANSAC算法明显比LSM更具有优越性,拟合结果的线性相关度R2要明显高于LSM, 且拟合的比值(slope)(即线性拟合的斜率,Y与X的比值)更接近真实值2.0。
Figure 3. a~c—the simulated second harmonic signal of formaldehyde with noise level A=10, 20, 50, respectively e~f—the corresponding fitted results by using LSM and RANSAC algorithms, respectively
Table 1. Fitting results of two harmonic signal of formaldehyde simulation with different SNR (adding Gaussian noise with different amplitude A to Y, while keeping X unchanged)
experimental results R2 slope LSM RANSAC LSM RANSAC A=0(RSNR=∞) 1.0 1.0 2.0 2.0 A=5(RSNR=8.87) 0.9833 0.9935 1.9845 2.0059 A=10(RSNR=6.00) 0.9371 0.9831 2.0183 1.9997 A=20(RSNR=3.54) 0.8045 0.9412 2.0252 2.0003 A=50(RSNR=0.64) 0.2654 0.7482 1.6848 1.9083 -
为了进一步对两种拟合算法进行评估,将两种拟合模型应用到实验中记录的大气甲醛二次谐波信号处理中,实验测量系统如参考文献[16]中所述。大气中甲醛含量极低,因此,实验中测量的光谱信号质量较差。实验上获得二次谐波信号I2, f与气体分子浓度C之间满足以下关系[17]:
$ {I_{{\rm{2, f}}}} \propto {I_0}\alpha CL $
(3) 式中,I0为激光初始光强,α为分子吸收系数,L为有效吸收光程。因此,通过将未知浓度的样品信号与已知参考样品的信号进行对比分析,即可消除初始光强的影响,从而获得未知样品的浓度信息。本文中主要是通过已知浓度的甲醛信号,对相关算法的可靠性进行初步的评估。图 4a是不同甲醛体积分数的两个二次谐波信号(signal_1:42×10-9;signal_2:35×10-9),信号基线部分受到采集系统噪声的严重干扰。类似于图 3处理方法,以signal_2的数据点为横坐标和signal_1的数据点为纵坐标时,给出如图 4b中符号“·”所示的依赖关系(包含inliers和outliers),图中符号“-”描述的分别为LSM和RANSAC算法线性拟合的结果。由此图可见,LSM算法处理的对象为整个数据点集(inliers+outliers), 而RANSAC算法通过排除含有噪声干扰的数据点(outliers),只对有效数据(inliers)进行拟合分析,从而使得拟合结果的可靠性得到显著提高。图 5a为甲醛体积分数为35×10-9不变的情况下,长时间连续测量二次谐波信号时不同时刻下选取的两个二次谐波信号(signal_2和signal_3),由于受系统的稳定性和背景噪声的干扰,使得信号的峰峰值出现明显的上下波动。同理,图 5b中给出了signal_2和signal_3之间的依赖关系(如“·”所示),及相应LSM和RANSAC算法线性拟合结果(如“-”所示),拟合结果相关的参量统计归纳在表 2中。
Figure 4. a—the experimentally measured second harmonic signal of formaldehyde with different concentrations b—the fitting results by using LSM and RANSAC algorithms, respectively
Figure 5. a—the experimentally measured second harmonic signal of formaldehyde with same concentration b—the corresponding fitted results by using LSM and RANSAC algorithm, respectively
Table 2. Linear fitting results of formaldehyde spectra under different experimental conditions
formaldehyde sample 1 sample 2 LSM RANSAC LSM RANSAC actual ratio 1.20 1.20 1.0 1.0 fitted value 1.1818 1.1818 0.5342 1.0272 correlation coefficient R2 0.8923 0.9853 0.2111 0.9743 error/% 1.517 -0.18 46.58 -2.72 从拟合结果可以看出,LSM在信噪比较低情况下,极易受异常数据的影响,使拟合模型明显偏离,线性相关度较低,拟合的体积分数误差高达47%。而RANSAC算法通过设置阈值来区分内外点,可以很好地排除仪器系统噪声(光学干涉噪声和电子学噪声)的影响,使得拟合线性相关度提高,反演的气体体积分数误差较小。
通过以上对仿真信号和实验数据的分析处理可见,当光谱数据信噪比较高的时候,两种模型拟合结果保持很好的一致性,当光谱数据信噪比较差的时候,尤其是光谱信号受到采集系统噪声的严重干扰,RANSAC算法比LSM更能鲁棒性地估计模型参量,提高线性相关度,减小气体浓度反演的误差。
随机抽样一致性算法在激光光谱中的应用研究
Applications of random sample consistency algorithm on laser spectroscopy
-
摘要: 为了解决波长调制激光光谱技术探测大气痕量气体浓度中信号处理算法的不足,提出了一种基于随机抽样一致性算法的气体浓度反演算法。以大气甲醛分子的仿真信号和实际测量信号为例,进行了理论分析和实验研究,并与传统的最小二乘法相比较。结果表明,该算法具有较强的抗噪声和异常点干扰能力,尤其是在低信噪比的条件下,精确度可提高1个量级,体现出较高的可靠性和优越性。Abstract: In order to solve the insufficient of signal process algorithms during the detection of atmospheric trace gas concentration by wavelength modulation laser spectroscopy technique, a new method of gas concentration inversion based on the random sample consistency (RANSAC) algorithm was proposed. By choosing the simulation signal and the actual measurement signal of formaldehyde in the atmosphere as examples, theoretical analysis and experimental study were carried out and compared with the traditional least square method. The results show that the proposed algorithm has better immunity to noises and outliers. Especially under the conditions of low signal-to-noise ratio (SNR), the measurement accuracy can be improved by one order of magnitude. The algorithm shows better reliability and superiority.
-
Table 1. Fitting results of two harmonic signal of formaldehyde simulation with different SNR (adding Gaussian noise with different amplitude A to Y, while keeping X unchanged)
experimental results R2 slope LSM RANSAC LSM RANSAC A=0(RSNR=∞) 1.0 1.0 2.0 2.0 A=5(RSNR=8.87) 0.9833 0.9935 1.9845 2.0059 A=10(RSNR=6.00) 0.9371 0.9831 2.0183 1.9997 A=20(RSNR=3.54) 0.8045 0.9412 2.0252 2.0003 A=50(RSNR=0.64) 0.2654 0.7482 1.6848 1.9083 Table 2. Linear fitting results of formaldehyde spectra under different experimental conditions
formaldehyde sample 1 sample 2 LSM RANSAC LSM RANSAC actual ratio 1.20 1.20 1.0 1.0 fitted value 1.1818 1.1818 0.5342 1.0272 correlation coefficient R2 0.8923 0.9853 0.2111 0.9743 error/% 1.517 -0.18 46.58 -2.72 -
[1] PHILIPPE L C, HANSON R K. Laser diode wavelength-modulation spectroscopy for simultaneous measurement of temperature, pressure, and velocity in shock-heated oxygen flows[J]. Applied Optics, 1993, 32(30):6090-6103. doi: 10.1364/AO.32.006090 [2] LI J S, YU B L, ZHAO W X, et al. A review of signal enhancement and noise reduction techniques for tunable diode laser absorption spectroscopy[J]. Applied Spectroscopy Reviews, 2014, 49(8):666-691. doi: 10.1080/05704928.2014.903376 [3] REID J, LABRIE D. Second-Harmonic detection with tunable diode lasers-comparison of experiment and theory[J]. Applied Physics, 1981, B26(3):203-210. [4] LI J, REIFFS A, PARCHATKA U, et al. In situ measurements of atmospheric CO and its correlation with NOx and O3 at a rural mountain site[J]. Metrology and Measurement Systems, 2015, 22(1):25-38. doi: 10.1515/mms-2015-0001 [5] CAI Y, WU Sh Q, WU A, et al. Study on calculation method of detection limit based on wavelength modulation spectroscopy[J]. Laser Technology, 2012, 36(3):390-393(in Chinese). [6] XU Y Z, GUO J Q, GAO X R, et al. Effect of temperature on absorption spectral lines of carbon monoxide[J]. Laser Technology, 2010, 34(6):778-780(in Chinese). [7] PLACKETT R L. The discovery of the method of least squares[J]. Biometrika, 1972, 59(2):239-251. [8] JIA X Y, XU C S, BAI X. The invention and way of thinking on least squares[J]. Joumal of Northwest University, 2006, 36(3):507-511(in Chinese). [9] FISCHLER M A, BOLLES R C.Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography[J]. Communications of the ACM, 1981, 24(6):381-395. doi: 10.1145/358669.358692 [10] ZHOU C L, ZHU H H, LI X J. Research and application of robust plane fitting algorithm with RANSAC[J]. Computer Engineering and Applications, 2011, 47(7):177-179(in Chinese). [11] CAO Y, FENG Y, YANG Y T, et al. Application of estimation algorithm based on RANSAC in road points cloud optimization[J]. Infrared and Laser Engineering, 2012, 41(11):3108-3112(in Chinese). [12] WEI Y Z, LIU X L. Robust plane fitting of clouds based on RANSAC[J]. Journal of Beijing University of Technology, 2014, 40(3):400-403(in Chinese). [13] ZHEN Y, LIU X J, WANG M Zh. An improved RANSAC of fundamental matrix estimation method[J]. Bulletin of Surveying and Mapping, 2014(4):39-43(in Chinese). [14] ZHANG H M, ZHENG Z. An improvement of the adjacent probability random sampling consistency algorithm[J]. Laser Journal, 2013, 34(5):29-30(in Chinese). [15] HAST A, NYSJÖ J, MARCHETTI A. Optimal RANSAC-towards a repeatable algorithm for finding the optimal set[J]. Journal of WSCG, 2013, 21(1):21-30. [16] LI J S, PARCHATKA U, FISCHER H. A formaldehyde trace gas sensor based on a thermoelectrically cooled CW-DFB quantum cascade laser[J]. Analytical Methods, 2014, 6(15):5483-5488. doi: 10.1039/C3AY41964A [17] LI J, PARCHATKA U, FISCHER H. Development of field-deployable real time QCL spectrometer for simultaneous detection of ambient N2O and CO[J]. Sensors and Actuators, 2013, B182(3):659-667.