Hyperspectral image detection of wheat seed purity based on SMOTE-UVE-SVM

ZHU Panyu; HUANG Min; ZHAO Xin

doi:10.7510/jgjs.issn.1001-3806.2024.02.021

Volume 48 Issue 2

Mar. 2024

Article Contents

Turn off MathJax

Article Navigation > LASER TECHNOLOGY > 2024 > 48(2): 281-287

Citation:

Hyperspectral image detection of wheat seed purity based on SMOTE-UVE-SVM

School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China

Corresponding author: HUANG Min, huangmzqb@163.com ;
Received Date: 2023-03-02
Accepted Date: 2023-04-07

Abstract

In order to solve the problem, the performance of the wheat seed purity detection model decreased due to sample imbalance and band information redundancy in the process of hyperspectral imaging. A seed purity hyperspectral detection model was proposed by combining the synthetic minority oversampling technique (SMOTE) with uninformative variables elimination (UVE) and support vector machine (SVM). In this model, the SMOTE was used to expand the minority class (impurity) samples of the wheat seeds to improve the sample imbalance. At the same time, the UVE was used to select the high-dimensional hyperspectral features, and the SVM model was constructed to further reduce the risk of model overfitting caused by feature redundancy. Results showed that: The average accuracy, precision, and negative sample detection rate of the five types of wheat seeds are 95.98%, 94.94%, and 89.32%, respectively, which are 3.89%, 7.18%, and 12.42% higher than the traditional methods, respectively. The proposed method has a good application prospect in the detection of wheat seed purity based on hyperspectral imaging technology.
- spectroscopy,
- hyperspectral imaging technology,
- synthetic minority oversampling technique,
- uninformative variables elimination,
- seed purity

References

[1]	BAO Y, MI C, WU N, et al. Rapid classification of wheat grain varieties using hyperspectral imaging and chemometrics[J]. Applied Sciences, 2019, 9(19): 4119. doi: 10.3390/app9194119
[2]	FENG L, ZHU S, LIU F, et al. Hyperspectral imaging for seed quality and safety inspection: A review[J]. Plant Methods, 2019, 15(1): 1-25. doi: 10.1186/s13007-018-0385-5
[3]	QIU Z, CHEN J, ZHAO Y, et al. Variety identification of single rice seed using hyperspectral imaging combined with convolutional neural network[J]. Applied Sciences, 2018, 8(2): 212. doi: 10.3390/app8020212
[4]	YANG X, HONG H, YOU Z, et al. Spectral and image integrated analysis of hyperspectral data for waxy corn seed variety classification[J]. Sensors, 2015, 15(7): 15578-15594. doi: 10.3390/s150715578
[5]	黄敏, 夏超, 朱启兵, 等. 融合高光谱图像技术与MS-3DCNN的小麦种子品种识别模型[J]. 农业工程学报, 2021, 37(18): 153-160. doi: 10.11975/j.issn.1002-6819.2021.18.018 HUANG M, XIA Ch, ZHU Q B, et al. Recognizing wheat seed varieties using hyperspectral imaging technology combined with multi-scale 3D convolution neural network[J]. Transactions of the Chinese Society of Agricultural Engineering, 2021, 37(18): 153-160(in Chin-ese). doi: 10.11975/j.issn.1002-6819.2021.18.018
[6]	SINGH P, NAYYAR A, SINGH S, et al. Classification of wheat seeds using image processing and fuzzy clustered random forest[J]. International Journal of Agricultural Resources, Governance and Eco-logy, 2020, 16(2): 123-156. doi: 10.1504/IJARGE.2020.109048
[7]	王和勇, 樊泓坤, 姚正安, 等. 不平衡数据集的分类方法研究[J]. 计算机应用研究, 2008, 25(5): 1301-1304. WANG H Y, FAN H K, YAO Zh A, et al. Research of imbalanced data classification[J]. Application Research of Computers, 2008, 25(5): 1301-1304(in Chinese).
[8]	闫红梅, 何明一. 基于聚类和联合偏度与峰度指数的高光谱数据波段选择算法[J]. 信号处理, 2023, 39(1): 1-10. YAN H M, HE M Y. Hyperspectral data band selection based on clustering joint skewness-kurtosis index[J]. Journal of Signal Processing, 2023, 39(1): 1-10(in Chinese).
[9]	路燕, 任月, 崔宾阁. 噪声鲁棒的高光谱图像波段选择方法[J]. 遥感学报, 2022, 26(11): 2382-2398. LU Y, REN Y, CUI B G. Noise robust band selection method for hyperspectral images[J]. National Remote Sensing Bulletin, 2022, 26(11): 2382-2398(in Chinese).
[10]	YANG S, ZHU Q B, HUANG M. Application of joint skewness algorithm to select optimal wavelengths of hyperspectral image for maize seed classification[J]. Spectroscopy and Spectral Analysis, 2017, 37(3): 990-996.
[11]	刘璐, 邵慧, 孙龙, 等. 利用高光谱激光雷达检测木材的霉变与含水量[J]. 激光技术, 2023, 47(5): 620-626. LIU L, SHAO H, SUN L, et al. Detection of mildew and moisture content in timber by hyperspectral LiDAR[J]. Laser Technology, 2023, 47(5): 620-626(in Chinese).
[12]	HUANG M, HE C, ZHU Q, et al. Maize seed variety classification using the integration of spectral and image features combined with feature transformation based on hyperspectral imaging[J]. Applied Sciences, 2016, 6(6): 183. doi: 10.3390/app6060183
[13]	BRUNING B, LIU H, BRIEN C, et al. The development of hyperspectral distribution maps to predict the content and distribution of nitrogen and water in wheat (Triticum aestivum)[J]. Frontiers in Plant Science, 2019, 10: 1380. doi: 10.3389/fpls.2019.01380
[14]	SINGH C B, JAYAS D S, PALIWAL J, et al. Identification of insect-damaged wheat kernels using short-wave near-infrared hyperspectral and digital colour imaging[J]. Computers and Electronics in Agriculture, 2010, 73(2): 118-125. doi: 10.1016/j.compag.2010.06.001
[15]	童莹萍, 冯伟, 宋怡佳, 等. 面向不平衡高光谱遥感分类的SMOTE和旋转森林动态集成算法[J]. 遥感学报, 2022, 26(11): 2369-2381. TONG Y P, FENG W, SONG Y J, et al. Dynamic ensemble algorithm of SMOTE and rotation forest for imbalanced hyperspectral remote sensing classification[J]. National Remote Sensing Bulletin, 2022, 26(11): 2369-2381 (in Chinese).
[16]	DOU Z, GAO K, ZHANG X, et al. Band selection of hyperspectral images using attention-based autoencoders[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(1): 147-151.
[17]	BAJCSY P, GROVES P. Methodology for hyperspectral band selection[J]. Photogrammetric Engineering & Remote Sensing, 2004, 70(7): 793-802.
[18]	CENTNER V, MASSART D L, DE NOORD O E, et al. Elimination of uninformative variables for multivariate calibration[J]. Analytical Chemistry, 1996, 68(21): 3851-3858. doi: 10.1021/ac960321m
[19]	CORTES C, VAPNIK V. Support vector machine[J]. Machine Learning, 1995, 20(3): 273-297.
[20]	张政, 李世强. 基于AdaBoost改进随机森林和SVM的极化SAR地物分类[J]. 中国科学院大学学报, 2022, 39(6): 776-782. ZHANG Zh, LI Sh Q. Polarimetric SAR image classification based on AdaBoost improved random forest and SVM[J]. Journal of University of Chinese Academy of Sciences, 2022, 39(6): 776-782 (in Chinese).
[21]	黄江, 李雨涵, 吴盛斌, 等. 基于多元特征参数与改进SVM算法的驾驶风格识别研究[J]. 重庆理工大学学报(自然科学版), 2022, 36(11): 8-19. HUANG J, LI Y H, WU Sh B, et al. Research on driving style re-cognition based on multivariate feature parameters and an improved SVM algorithm[J]. Journal of Chongqing University of Technology(Natural Science Edition), 2022, 36(11): 8-19 (in Chinese).
[22]	杨丽, 高美婷. 基于LS-SVM测量生物组织光学参量的实验研究[J]. 激光技术, 2015, 39(3): 300-303. YANG L, GAO M T. Experimental study about measurement of optical parameters of biological tissue based on least square support vector machine[J]. Laser Technology, 2015, 39(3): 300-303(in Chinese).
[23]	TAX D M J, DUIN R P W. Support vector data description[J]. Machine Learning, 2004, 54(1): 45-66. doi: 10.1023/B:MACH.0000008084.60811.49
[24]	康颖, 赵治华, 吴灏, 等. 基于Deep SVDD的通信信号异常检测方法[J]. 系统工程与电子技术, 2022, 44(7): 2319-2328. KANG Y, ZHAO Zh H, WU H, et al. Deep SVDD-based anomaly detection method for communication signals[J]. Systems Engineering and Electronics, 2022, 44(7): 2319-2328 (in Chinese).
[25]	ZHAO Y, ZHANG X, SHANG Z, et al. A novel hybrid method for KPI anomaly detection based on VAE and SVDD[J]. Symmetry, 2021, 13(11): 2104. doi: 10.3390/sym13112104
[26]	蒋卫恒, 段耀星, 李明玉, 等. 一种基于维度加权盲K近邻算法的数字预失真技术[J]. 电子与信息学报, 2023, 45(2): 446-454. JIANG W H, DUAN Y X, LI M Y, et al. A digital predistortion technique based on the dimension weighted blind K-nearest neighbor algorithm[J]. Journal of Electronics & Information Technology, 2023, 45(2): 446-454 (in Chinese).
[27]	SYALIMAN K U. Enhance the accuracy of K-nearest neighbor (KNN) for unbalanced class data using synthetic minority oversampling technique (smote) and gain ratio (GR)[J]. INFOKUM, 2021, 10(1): 188-195.
[28]	杜娟, 刘志刚, 衣治安. 一种适用于不均衡数据集分类的KNN算法[J]. 科学技术与工程, 2011, 11(12): 2680-2685. DU J, LIU Zh G, YI Zh A. A KNN algorithm for unbalanced data set[J]. Science Technology and Engineering, 2011, 11(12): 2680-2685 (in Chinese).
[29]	张楠楠, 张晓, 王城坤, 等. 基于高光谱和连续投影算法的棉花叶面积指数估测[J]. 农业机械学报, 2022, 53(S1): 257-262. ZHANG N N, ZHANG X, WANG Ch K, et al. Cotton LAI estimation based on hyperspectral and successive projection algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53(S1): 257-262 (in Chinese).

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(4) / Tables(6)

Get Citation

PDF

XML

Article views(543) PDF downloads(7) Cited by()

Proportional views

HTML

0. 引言

不同品种的小麦在抗虫病害、质量、产量等方面表现出不同的特性，高品质的小麦种子在小麦产量和质量的提高中起着至关重要的作用，种子混杂时将给育种、种植和商品质量带来巨大的经济损失。随着现代种业中大量种子的不断流通，很可能造成不同品种小麦种子在运输、储存和生产过程中意外混合，必然会降低小麦品质和产量^[1]。连续几年的中央一号文件都将种业发展提升到国家粮食战略上，在培育优质小麦种子的同时，对种子纯度进行鉴别成为了科研工作者迫切的任务，因此，种子纯度也成为了《农作物种子检验规程》的必检项目。

近10年来，随着具有快速、无损的高光谱成像技术的发展，越来越多的科研工作者将高光谱成像技术应用到农作物种子研究上，并取得了良好的成果^[2-4]。但是, 当前基于高光谱成像技术对小麦种子进行纯度鉴别或分类的许多研究都是假设正负样本数量是一致的^{[1, 5-6]}，文中正样本指待检测出的所需品种小麦种子，负样本指混入的其它杂质品种小麦种子。然而当面临实际检测时，由于杂质负样本数量少于正样本数量的情况会导致利用传统算法时分类结果倾向于多数类，而使少数类的分类效果降低^[7]。同时，相较于多光谱而言，高光谱图像含有的信息较多，相邻波段间的冗余信息高度相关，还有可能携带无助于判别的噪声信息，因此这也会在一定程度上影响模型纯度检测性能^[8-9]。

为了减小样本不均衡造成的纯度检测精度低的影响，本文作者采用合成少数类过采样技术(synthetic minority oversampling technique, SMOTE)对小麦种子的杂质负样本进行扩充，使正负样本数量保持一致，为了消除波段间存在的冗余信息，进一步提高模型的检测精度，再采用非信息变量去除(uninformative variable elimination, UVE)进行波段选择，最后采用支持向量机(support vector machines, SVM)作为分类器。为了更好地比较模型性能，在波段选择算法上对比了连续投影算法(successive projections algorithm, SPA)，在分类算法上分别对比了支持向量数据描述(support vector data description, SVDD)和k最近邻(k-nearest neighbor, KNN)算法，结果表明，所提方法在所有比较的方法中效果最好。

2. 基于SMOTE-UVE-SVM方法的模型构建

2.1. 合成少数类过采样技术

由于过采样技术只是对样本进行简单机械地复制，并不能引入新的有效信息，因而往往不会提升模型的分类效果，还可能造成过拟合的风险^[15]。为了提升模型的分类效果并降低过采样技术中过拟合的风险，本文作者提出了将SMOTE应用到不均衡样本的小麦种子纯度检测中，使正负样本的数量一致。SMOTE合成样本的策略是对负样本所在类中的每一个样本m，从其邻近的若干样本中随机选择P个样本m_i(i=1, 2, …, P)，然后从m和每一个m_i之间合成一个新的样本m_new，样本合成公式如下所示：

式中：rand(0, 1)表示0~1范围内的随机数。

2.2. 非信息变量剔除算法

与多光谱图像相比，高光谱图像数据的波段数目较多，相邻波段信息高度相关，因而可能携带无助于判别的冗余信息^[16-17]。为了消除波段间的冗余信息，并尽可能地提高模型的分类精度，本文中在对杂质负样本进行扩充后，对光谱波段进行选择。UVE的基本思想是通过建立偏最小二乘(partial least squares, PLS)模型去除与信息判别无关的波长，选出特征波长，被去除的波长也称为无信息变量^[18]，假设原始样本矩阵为样本矩阵，记为X_M×N，样本标签记为y_M×1，其中M为样本个数，N为波段数目，算法步骤见下。

(a) 以最低均方根误差预测(root-mean-square error of prediction，RMSEP)R_RMSEP为准则^[18]确定最优模型复杂度，如下式所示：

式中：y_i为样本标签; $ \hat{y}_{i}$为预测值; i=1, 2, 3, …, M。

(b) 生成M×N维噪声矩阵R_M×N，将原始矩阵与噪声矩阵组成新的矩阵，记为S_M×2N，S_M×2N=[X_M×N, R_M×N]。

(c) 根据留一交叉验证确定回归系数矩阵A_M×2N，将A_M×2N中的元素记为a_ij，其中i=1, 2, 3, …, M; j=1, 2, 3, …, 2N。

(d) 确定各列向量平均值 a_j、标准差s(a_j)以及回归系数平均值与标准差的比值，记为c_j，j=1, 2, 3, …, 2N。

(e) 在X_M×N中消除abs(c_j)小于某一阈值的实验变量，j=1, 2, 3, …, 2N，得到新的矩阵X_new，建立最终留一交叉验证模型，并对样本矩阵进行预测，得到新的R_RMSEP，记为R_RMSEPN。

(f) 分析R_RMSEPN，若R_RMSEPN＞R_RMSEP, 则重设噪声参数，进入步骤(b)循环，否则输出光谱特征变量。

2.3. 支持向量机

本文中采用SVM作为分类器。线性SVM的基本原理为：寻求建立一个几何间隔最大的超球面, 使正样本数据尽可能地分布在该超球面内，而负样本数据分布在超球面外。然而, 当面临例如小麦种子纯度检测这类低维线性不可分的问题时，则无法通过线性SVM进行求解。CORTES等人^[19]将核函数引入SVM中，通过非线性变换将该低维数据转变为高维数据，从而实现其线性可分。因而自其提出以来, 在许多领域受到了较为广泛的应用, 并取得了良好效果^[20-22]。根据SVM的原理，构建的超平面应为：

式中: w表示平面的法向量; b表示偏移量。为了使训练数据集{x|x_i ∈ R^N, i=1, 2, …, M}(其中x_i是1×N维的向量，R^N是实数集)和y_i∈{+1, -1}能够正确分类，则应满足：

式中: sign(·)是符号函数; Ω(x)为非线性映射。则最终构建的SVM优化函数及其约束条件为：

式中：J(w, e_i)表示目标函数; ‖w‖表示w的F范数; φ(x_i)表示非线性函数; e_i为误差变量，i=1, 2, …, M；γ为惩罚因子, 其求解过程转化为朗格朗日的对偶问题，最终求得：

式中：α_i是拉格朗日系数；Q(x, x_i)为核函数，通常采用径向基核函数，其公式如下：

式中：σ²为方差。

2.4. 种子纯度评价指标

本文中采用准确率、精确率以及负样本的检出率t作为分类模型的评价指标，将样品分为5类：真正(true positive，TP)t_TP，假正(false positive，FP)t_FP，假负(false negative，FN)t_FN，真负(true negative)t_TN和负(negative，NEG)t_NEG, 则准确率、精确率和负样本检出率的计算公式如下所示。

准确率为：

精确率为：

负样本检出率(negative detection，ND)t_ND为：

4. 结论

本文中研究了小麦种子杂质负样本不足导致纯度检测过程中模型性能偏低的问题。在运用高光谱成像技术的基础上，提出了一种基于SMOTE-UVE-SVM方法的纯度检测模型，该方法首先利用SMOTE算法对小麦种子中杂质负样本数据集扩充，使其与正样本数据的数量保持一致，再利用UVE进行波段选择，消除相同高光谱波段间可能存在的信息冗余和噪声，进一步提高小麦种子纯度检测的精度，最后用SVM进行分类。为了更好地比较模型的性能，实验过程中在均衡与不均衡样本的分类算法上分别对比了SVDD、KNN和SVM，在波段选择算法上分别对比了SPA和UVE。研究结果表明，SMOTE-UVE-SVM模型在所有比较的方法中效果最佳。由于当前仅研究了负样本中含有的4类小麦种子，在以后条件许可的情况下，应尽可能地增加负样本的类别，使模型更加具有通用性，以及能够识别出待测样本中出现未参与模型训练品种的小麦种子。

Reference (29)

[1]	BAO Y, MI C, WU N. Rapid classification of wheat grain varieties using hyperspectral imaging and chemometrics[J]. Applied Sciences, 2019, 9(19): 4119-. doi: 10.3390/app9194119
[2]	FENG L, ZHU S, LIU F. Hyperspectral imaging for seed quality and safety inspection: A review[J]. Plant Methods, 2019, 15(1): 1-25. doi: 10.1186/s13007-018-0385-5
[3]	QIU Z, CHEN J, ZHAO Y. Variety identification of single rice seed using hyperspectral imaging combined with convolutional neural network[J]. Applied Sciences, 2018, 8(2): 212-. doi: 10.3390/app8020212
[4]	YANG X, HONG H, YOU Z. Spectral and image integrated analysis of hyperspectral data for waxy corn seed variety classification[J]. Sensors, 2015, 15(7): 15578-15594. doi: 10.3390/s150715578
[5]	黄敏, 夏超, 朱启兵. 融合高光谱图像技术与MS-3DCNN的小麦种子品种识别模型[J]. 农业工程学报, 2021, 37(18): 153-160. doi: 10.11975/j.issn.1002-6819.2021.18.018	HUANG M, XIA Ch, ZHU Q B. Recognizing wheat seed varieties using hyperspectral imaging technology combined with multi-scale 3D convolution neural network[J]. Transactions of the Chinese Society of Agricultural Engineering, 2021, 37(18): 153-160. doi: 10.11975/j.issn.1002-6819.2021.18.018
[6]	SINGH P, NAYYAR A, SINGH S. Classification of wheat seeds using image processing and fuzzy clustered random forest[J]. International Journal of Agricultural Resources, Governance and Eco-logy, 2020, 16(2): 123-156. doi: 10.1504/IJARGE.2020.109048
[7]	王和勇, 樊泓坤, 姚正安. 不平衡数据集的分类方法研究[J]. 计算机应用研究, 2008, 25(5): 1301-1304.	WANG H Y, FAN H K, YAO Zh A. Research of imbalanced data classification[J]. Application Research of Computers, 2008, 25(5): 1301-1304.
[8]	闫红梅, 何明一. 基于聚类和联合偏度与峰度指数的高光谱数据波段选择算法[J]. 信号处理, 2023, 39(1): 1-10.	YAN H M, HE M Y. Hyperspectral data band selection based on clustering joint skewness-kurtosis index[J]. Journal of Signal Processing, 2023, 39(1): 1-10.
[9]	路燕, 任月, 崔宾阁. 噪声鲁棒的高光谱图像波段选择方法[J]. 遥感学报, 2022, 26(11): 2382-2398.	LU Y, REN Y, CUI B G. Noise robust band selection method for hyperspectral images[J]. National Remote Sensing Bulletin, 2022, 26(11): 2382-2398.
[10]	YANG S, ZHU Q B, HUANG M. Application of joint skewness algorithm to select optimal wavelengths of hyperspectral image for maize seed classification[J]. Spectroscopy and Spectral Analysis, 2017, 37(3): 990-996.
[11]	刘璐, 邵慧, 孙龙. 利用高光谱激光雷达检测木材的霉变与含水量[J]. 激光技术, 2023, 47(5): 620-626.	LIU L, SHAO H, SUN L. Detection of mildew and moisture content in timber by hyperspectral LiDAR[J]. Laser Technology, 2023, 47(5): 620-626.
[12]	HUANG M, HE C, ZHU Q. Maize seed variety classification using the integration of spectral and image features combined with feature transformation based on hyperspectral imaging[J]. Applied Sciences, 2016, 6(6): 183-. doi: 10.3390/app6060183
[13]	BRUNING B, LIU H, BRIEN C. The development of hyperspectral distribution maps to predict the content and distribution of nitrogen and water in wheat (Triticum aestivum)[J]. Frontiers in Plant Science, 2019, 10(): 1380-. doi: 10.3389/fpls.2019.01380
[14]	SINGH C B, JAYAS D S, PALIWAL J. Identification of insect-damaged wheat kernels using short-wave near-infrared hyperspectral and digital colour imaging[J]. Computers and Electronics in Agriculture, 2010, 73(2): 118-125. doi: 10.1016/j.compag.2010.06.001
[15]	童莹萍, 冯伟, 宋怡佳. 面向不平衡高光谱遥感分类的SMOTE和旋转森林动态集成算法[J]. 遥感学报, 2022, 26(11): 2369-2381.	TONG Y P, FENG W, SONG Y J. Dynamic ensemble algorithm of SMOTE and rotation forest for imbalanced hyperspectral remote sensing classification[J]. National Remote Sensing Bulletin, 2022, 26(11): 2369-2381.
[16]	DOU Z, GAO K, ZHANG X. Band selection of hyperspectral images using attention-based autoencoders[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(1): 147-151.
[17]	BAJCSY P, GROVES P. Methodology for hyperspectral band selection[J]. Photogrammetric Engineering & Remote Sensing, 2004, 70(7): 793-802.
[18]	CENTNER V, MASSART D L, DE NOORD O E. Elimination of uninformative variables for multivariate calibration[J]. Analytical Chemistry, 1996, 68(21): 3851-3858. doi: 10.1021/ac960321m
[19]	CORTES C, VAPNIK V. Support vector machine[J]. Machine Learning, 1995, 20(3): 273-297.
[20]	张政, 李世强. 基于AdaBoost改进随机森林和SVM的极化SAR地物分类[J]. 中国科学院大学学报, 2022, 39(6): 776-782.	ZHANG Zh, LI Sh Q. Polarimetric SAR image classification based on AdaBoost improved random forest and SVM[J]. Journal of University of Chinese Academy of Sciences, 2022, 39(6): 776-782.
[21]	黄江, 李雨涵, 吴盛斌. 基于多元特征参数与改进SVM算法的驾驶风格识别研究[J]. 重庆理工大学学报(自然科学版), 2022, 36(11): 8-19.	HUANG J, LI Y H, WU Sh B. Research on driving style re-cognition based on multivariate feature parameters and an improved SVM algorithm[J]. Journal of Chongqing University of Technology(Natural Science Edition), 2022, 36(11): 8-19.
[22]	杨丽, 高美婷. 基于LS-SVM测量生物组织光学参量的实验研究[J]. 激光技术, 2015, 39(3): 300-303.	YANG L, GAO M T. Experimental study about measurement of optical parameters of biological tissue based on least square support vector machine[J]. Laser Technology, 2015, 39(3): 300-303.
[23]	TAX D M J, DUIN R P W. Support vector data description[J]. Machine Learning, 2004, 54(1): 45-66. doi: 10.1023/B:MACH.0000008084.60811.49
[24]	康颖, 赵治华, 吴灏. 基于Deep SVDD的通信信号异常检测方法[J]. 系统工程与电子技术, 2022, 44(7): 2319-2328.	KANG Y, ZHAO Zh H, WU H. Deep SVDD-based anomaly detection method for communication signals[J]. Systems Engineering and Electronics, 2022, 44(7): 2319-2328.
[25]	ZHAO Y, ZHANG X, SHANG Z. A novel hybrid method for KPI anomaly detection based on VAE and SVDD[J]. Symmetry, 2021, 13(11): 2104-. doi: 10.3390/sym13112104
[26]	蒋卫恒, 段耀星, 李明玉. 一种基于维度加权盲K近邻算法的数字预失真技术[J]. 电子与信息学报, 2023, 45(2): 446-454.	JIANG W H, DUAN Y X, LI M Y. A digital predistortion technique based on the dimension weighted blind K-nearest neighbor algorithm[J]. Journal of Electronics & Information Technology, 2023, 45(2): 446-454.
[27]	SYALIMAN K U. Enhance the accuracy of K-nearest neighbor (KNN) for unbalanced class data using synthetic minority oversampling technique (smote) and gain ratio (GR)[J]. INFOKUM, 2021, 10(1): 188-195.
[28]	杜娟, 刘志刚, 衣治安. 一种适用于不均衡数据集分类的KNN算法[J]. 科学技术与工程, 2011, 11(12): 2680-2685.	DU J, LIU Zh G, YI Zh A. A KNN algorithm for unbalanced data set[J]. Science Technology and Engineering, 2011, 11(12): 2680-2685.
[29]	张楠楠, 张晓, 王城坤. 基于高光谱和连续投影算法的棉花叶面积指数估测[J]. 农业机械学报, 2022, 53(S1): 257-262.	ZHANG N N, ZHANG X, WANG Ch K. Cotton LAI estimation based on hyperspectral and successive projection algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53(S1): 257-262.

variety	origin place	kinship	gluten value
JM22	Shandong	935024/935106	medium gluten
XM26	Henan	Xinmai9408/Jinan17	strong gluten
JM44	Shandong	954072/Jinan17	strong gluten
BN4199	Henan	Bainonggaoguang3709F2/BainongAK58	medium gluten
ZM33	Henan	Zhengmai366/BainongAK58	strong gluten

band range of absorption peak/nm	dominant factors
400~420	protein^[12]
460~500	protein^[12]
610~640	grease、protein^[1]
900~920	water content^[13]
950~1000	O—H stretching、protein^[14]

positive samples	SVDD			KNN			SVM
positive samples	A/%	P/%	t_ND/%	A/%	P/%	t_ND/%	A/%	P/%	t_ND/%
JM22	64.00	79.11	67.00	84.03	81.35	52.40	93.07	90.68	79.20
XM26	64.33	83.45	77.00	79.20	76.74	37.70	89.80	87.01	70.20
JM44	56.33	74.13	63.00	81.40	78.83	45.10	92.23	89.65	76.90
BN4199	62.33	87.18	85.00	82.70	80.01	49.80	95.50	84.09	87.50
ZM33	60.00	76.32	64.00	82.33	80.13	51.60	89.87	87.38	70.70
average	61.40	80.04	71.20	81.93	79.41	47.32	92.09	87.76	76.90

positive samples	SMOTE-KNN			SMOTE-SVM
positive samples	A/%	P/%	t_ND/%	A/%	P/%	t_ND/%
JM22	88.87	89.26	77.20	94.80	93.17	85.30
XM26	82.63	85.28	69.10	94.80	93.78	86.70
JM44	94.33	92.56	84.00	94.40	92.57	83.69
BN4199	88.57	89.74	78.60	96.53	95.49	90.60
ZM33	84.67	86.86	72.60	95.47	94.60	88.60
average	87.81	88.74	76.30	95.20	93.92	86.98

model	A/%	P/%	t_ND/%	number
SMOTE-UVE-KNN	91.12	90.09	78.50	71.00
SMOTE-SPA-KNN	90.50	89.62	77.61	74.00
SMOTE-UVE-SVM	95.98	94.94	89.32	71.00
SMOTE-SPA-SVM	95.30	94.12	87.52	74.00

Hyperspectral image detection of wheat seed purity based on SMOTE-UVE-SVM