Early detection of Fusarium wilt in Pepper using multispectral images based on UAV

Gang-In Je; Chan-Seok Ryu; Jong-Chan Jeong; Chang-Hyeok Park; Ye-Seong Kang

doi:10.22765/pastj.20240017

Preview

Research Article

Precision Agriculture Science and Technology. 31 December 2024. 238-249
https://doi.org/10.22765/pastj.20240017

Early detection of Fusarium wilt in Pepper using multispectral images based on UAV

Gang-In Je¹

Chan-Seok Ryu¹^*

Jong-Chan Jeong¹

Chang-Hyeok Park¹

Ye-Seong Kang²

¹Department of Bio-system Engineering, GyeongSang National University (Institute of Agriculture & Life Science), Jinju 52828, Republic of Korea

²Department of Smart Agro-industry, GyeongSang National University (Institute of Agriculture & Life Science), Jinju 52725, Republic of Korea

^{*Corresponding Author}

License (open-access, https://creativecommons.org/licenses/by-nc/4.0/):

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

ABSTRACT

Pepper (Capsicum annuum L.) is an essential seasoning vegetable in Korean food. However, pepper cultivation is constrained by various viruses. Especially, Fusarium wilt is an economic problem threatening pepper production in many countries. The peppers were transplanted on May 2, and the multispectral images were taken on June 28, July 27, and August 26. There were 30 sampling points to measure the vegetation of pepper, but Fusarium wilt infection was confirmed in 15 samples on July 27 and 11 samples on August 25. Therefore, the possibility of Fusarium wilt detection on July 27 and August 25 was confirmed using the multispectral image taken on June 28 and July 27. It was possible to build models for detecting infected peppers using machine learning (KNN; K-Nearest Neighbors, SVM; Support Vector Machine, LR; Logistic Regression) and applying backward elimination to remove the 9 VIs ranked via correlation analysis with the ratio of train and test as 8:2, 7:3, and 6:4. In the case of the disease detection on July 27 using the image of June 28, the KNN model with 8 Vis was selected as the best model with a 7:3 ratio. However, the LR model with NDRE was chosen as the best model for disease detection on July 27 and August 25 using the images of June 28 and July 27 with a 8:2 ratio. The performance of the model which excluded the non-infected samples on August 25 was the best with DVI, TCARI, and RVI as 0.783, 0.733, 0.917, and 0.815 for the calibration and 0.909, 0.833, 1.000, and 0.909 for the validation in order of accuracy, precision, recall, and F1 score. Moreover, there was no error that the infected pepper was confirmed as the non-infected pepper in the convolution matrix. This study aims to develop models for early detection of Pepper Fusarium wilt by calculating vegetation indices based on reflectance values extracted from UAV-based multispectral images and applying them to machine learning classification algorithms. The model developed in this study is expected to contribute to improving the productivity of peppers by preventing the spread of disease through the early detection of pepper wilt.

Keywords

Pepper

Fusarium wilt

Multispectral

UAV

Machine learning

MAIN

Introduction
Materials and Methods
연구 대상 및 생육조사
다중분광 영상 취득
고추 시들음병 감염여부 데이터 수집
다중분광 영상 처리
통계분석
Results and Discussion
Case 1에 대한 분류 모델
Case 2에 대한 분류 모델
Case 3에 대한 분류 모델
고추 시들음병 분류 모델에 대한 고찰
Conclusion

Introduction

고추(Capsicum annuum L.)는 한국 음식에서 빼놓을 수 없는 중요한 양념 채소로, 전체 채소 중에서 가장 많은 재배면적과 수확량을 차지한다(RDA, 2020). 그러나 고추 재배는 흰가루병, 잎마름병, 세균점무늬병, 시들음병 등의 바이러스에 제약을 받는다(Gabrekiristos and Demiyo, 2020). 특히 시들음병(Fusarim wilt)은 전 세계 고추 생산에 지속적인 위협을 주고 있다(Engalycheva et al., 2024). 시들음병 발생 시 초기 대응이 늦어지면 수확량과 품질에 큰 영향을 미칠 수 있기 때문에, 이를 조기에 탐지하고 효과적으로 관리하는 연구가 필수적이다.

최근 농업 분야에서는 무인 항공기(UAV, Unmanned Aerial Vehicle)를 활용한 비파괴적인 작물 연구가 활발히 이루어지고 있다(Kang et al., 2021; Sun et al., 2023). 무인기에는 다양한 센서를 장착하여 사용할 수 있으며 그중 다중분광 센서는 질병으로 인한 식물 영양 상태와 대사 또는 생물학적 변화를 가시광(RGB)영역을 벗어난 특정 스펙트럼 파장을 통해 감지할 수 있다(Peng et al., 2022).

다중분광 센서를 이용하여 포도나무의 곰팡이병을 탐지하는 연구(Kerkech et al., 2020)와 바나나의 시들음병을 탐지하는 연구(Zhang et al., 2022)도 진행되었으며, 소나무의 시들음병을 조기에 탐지하는 연구(Yu et al., 2021)도 진행된 바 있다. 또한, 머신러닝 분류 알고리즘을 이용하여 벼 잎의 병을 탐지하는 연구(Ahmed et al., 2019)와 옥수수 잎의 병을 탐지하는 연구(Panigrahi et al., 2020)도 진행되었다.

작물의 질병탐지에 대한 많은 연구는 실험실 환경에서 수집된 RGB 기반 이미지 데이터셋을 사용하여 진행되어왔고(Ahmed et al., 2019; Jung et al., 2023; Ramesh et al., 2018), 실제 농업 현장에서 데이터를 수집하여 병증이 아직 확인되지 않은 작물의 반사값을 통해 병증을 조기에 탐지하는 연구는 부족하다.

따라서 본 연구는 실제 농가에서 취득한 UAV기반 다중분광 영상으로부터 추출된 반사값을 바탕으로 식생지수를 산출하고, 이를 머신러닝 분류 알고리즘에 적용하여 고추 시들음병의 발생여부를 조기에 탐지하는 모델을 개발하여 고추의 생산성 향상에 기여하는 것을 목표로 한다.

Materials and Methods

연구 대상 및 생육조사

본 연구는 전라북도 김제시 용지면 예촌리 농가포장(35°50′34.07″N 127°57′43.2″E, Fig. 1)에서 재배되고 있는 고추(Capsicum annuum L., Colormura)를 대상으로 진행하였다. 고추는 2023년 5월 2일에 정식하였고, Fig. 1(a)와 같이 1열에 재배되고 있는 30개의 묘목을 대상으로 6월 27일, 7월 27일, 8월 25일에 생육모니터링을 위해 초장, 경경, 마디수, 착과수를 조사하였다.

https://cdn.apub.kr/journalsite/sites/kspa/2024-006-04/N0570060401/images/kspa_2024_064_238_F1.jpg

Fig. 1.

(a) Sampling location, (b) Location of the infected samples.

다중분광 영상 취득

다중분광 영상은 회전익 무인기인 Matrice 300 RTK(DJI Technology Inc, China)에 다중분광 센서(Table 1)인 Altum-PT (MicaSense Inc, USA)를 탑재하여 비행 고도 25 m, 비행 속도 4 m/s, 종횡비 중첩도 75%로 6월 28일, 7월 27일, 8월 26일 정오에 촬영되었다.

Table 1.

Multispectral Sensor.

Category	Multispectral Sensor
Model	Altum-PT
Manufacture	MicaSense
Center wavelengths and bandwidth	475 nm ± 32 nm (Blue) 560 nm ± 27 nm (Green) 668 nm ± 14 nm (Red) 717 nm ± 12 nm (RedEdge) 842 nm ± 57 nm (NIR) 10.5 μm ± 6 μm (Thermal) 634.5 nm ± 463 nm (Panchromatic)
GSD @25 m	1.08 cm/pixel
Panchromatic GSD @25 m	0.52 cm/pixel

고추 시들음병 감염여부 데이터 수집

고추의 생육조사를 진행하던 중 7월 27일에 15개 샘플에서 시들음병(Fusarium wilt)의 감염을 확인하였고, 8월 25일에 남아있는 15개의 샘플 중 11개의 샘플에서 추가 감염을 확인하였다. Fig. 1(b)에 1열에 재배되고 있는 30개의 묘목을 3열로 나누어 도식화 하였고, 7월 27일에 감염이 확인된 개체를 빨간색, 8월 25일에 추가로 감염이 확인된 개체를 파란색, 8월 25일까지 감염되지 않은 개체를 검은색으로 나타내었다.

다중분광 영상 처리

취득한 개별 영상은 Pix4D mapper(Pix4D S.A., Switzerland)를 이용하여 접합하고 pansharpening 처리 후 QGIS(Quantum GIS, USA)로 정합하였다. ENVI 5.3(Exelis Visual Information Solution Inc., USA)에서 GNDVI-NDVI를 이용하여 식생과 배경을 분리(Fig. 2)하고 개체 별 반사값을 추출하였다.

(1)

G N D V I - N D V I = \frac{N I R - G r e e n}{N I R + G r e e n} - \frac{N I R - R e d}{N I R + R e d}

https://cdn.apub.kr/journalsite/sites/kspa/2024-006-04/N0570060401/images/kspa_2024_064_238_F2.jpg

Fig. 2.

Vegetation separation.

밴드의 반사값은 환경변화에 민감하기 때문에 정규화 식생지수 NDVI, GNDVI, NDRE, PRI, 단순비 식생지수 RVI, GRVI, 단순계산 식생지수 DVI, 변형된 식생지수 OSAVI, TCARI로 총 9개의 식생지수(Table 2)를 산출하여 사용하였다.

Table 2.

Vegetation Indices (VIs).

Name	Claculation	Reference
NDVI	$\frac{N I R - R e d}{N I R + R e d}$	(Huang et al., 2021)
GNDVI	$\frac{N I R - G r e e n}{N I R + G r e e n}$	(Hunt et al., 2008)
GRVI	$\frac{N I R}{G r e e n}$	(Avola et al., 2019)
RVI	$\frac{N I R}{R e d}$	(Basso et al., 2004)
DVI	$N I R - R e d$	(Basso et al., 2004)
NDRE	$\frac{N I R - R e d E d g e}{N I R + R e d E d g e}$	(Boiarskii and Hasegawa., 2019)
PRI	$\frac{B l u e - G r e e n}{B l u e + G r e e n}$	(Lee et al., 2021)
OSAVI	$\frac{N I R - R e d}{N I R + R e d + 0.16}$	(Zhang et al., 2019)
TCARI	$3 [R e d E d g e - 0.2 (R e d E d g e - G r e e n) (\frac{R e d E d g e}{R e d})$	(Zhang et al., 2019)

시들음병에 감염된 고추에서 반사값을 추출하여 시들음병의 감염여부에 대한 분석을 진행하려 하였으나 고사된 고추가 많아 반사값을 추출할 수 없었다. 따라서, 시들음병이 관측된 7월과 8월의 영상이 아닌 6월과 7월의 영상에서 추출된 묘목의 위치에 대한 각각의 반사값으로 산출된 식생지수를 이용하여 시들음병의 조기 검출 가능성을 검토하였다.

Table 3과 같이 6월 28일 영상의 식생지수로 7월 27일에 감염되지 않은 고추(15샘플)와 감염된 고추(15샘플)를 분류하는 경우를 Case 1, 추가로 7월 27일 영상의 식생지수로 8월 25일에 감염되지 않은 고추(4샘플)와 감염된 고추(11샘플)를 분류하는 경우를 Case 2, Case 2에서 감염되지 않은 데이터를 제외한 경우를 Case 3으로 구분하여 분석을 진행하였다.

Table 3.

Sampling numbers of each case.

Case	Captured date	Survey date	Non-infected (ea)	Infected (ea)	Total (ea)
1	June 28	July 27	15	15	30
2	June 28	July 27	15	15	30
	July 27	August 25	4	11	15
	Total		19	26	45
3	June 28	July 27	15	15	30
	July 27	August 25	-	11	11
	Total		15	26	41

통계분석

통계분석은 Jupyter Notebook 6.3.0(Python 3.8.8, Project Jupyter, USA)을 이용하였고 이상치는 IQR 방식(Yang et al., 2019)으로 제거하였다. 머신러닝 분류 알고리즘 중 가장 가까운 이웃들의 클래스를 참조하여 분류를 수행하는 K-Nearest Neighbor (KNN, Laaksonen and Oja, 1996), 초평면을 통해 데이터를 최대한 분리하는 경계선을 찾아내는 Support Vector Machine(SVM, Brereton and Lloyd, 2010), 입력 변수와 결과 변수 간의 선형 관계를 기반으로 특정 범주에 속할 확률을 예측하는 Logistic Regression (LR, Khurshid and Khan, 2014)을 이용하여 상관관계분석(Schober et al., 2018)을 통해 나열한 변수를 후진소거법으로 제거하면서 모델을 작성하였다. Calibration과 validation의 데이터 비율은 8:2, 7:3, 6:4 3가지로 설정하였고 평가지표로는 accuracy, precision, recall, F1-score를 사용하였다(Haque et al., 2022). 모델의 성능을 시각적으로 확인하기 위해 결과를 confusion matrix, precision-recall curve(P-R curve), ROC curve로 시각화 하였다. P-R curve는 양성(positive) 예측 중 참 양성(true positive)의 비율을 면적으로 평가한다(Saito and Rehmsmeier, 2015). 반면, 모델의 전체적인 성능을 평가하는 ROC curve(Saito and Rehmsmeier, 2015)는 P-R curve에 비해 음성(negative) 클래스에 영향을 더 받는다. 본 연구에서는 검증 및 일반화 성능을 고려하여 validation accuracy가 가장 높은 모델을 가장 좋은 모델로 판단하였고, validation accuracy가 동일할 때에는 실제 방제에서 중요한 지표인 양성(감염)인데 음성(비감염)이라고 예측한 False Negative(FN)를 사용하여 산출되는 validation recall이 더 높은 모델을 좋은 모델로 판단하였다.

(2)

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(3)

P r e c i s i o n = \frac{T P}{T P + F P}

(4)

R e c a l l = \frac{T P}{T P + F N}

(5)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

Results and Discussion

Case 1에 대한 분류 모델

6월 28일 영상의 식생지수로 7월 27일에 감염되지 않은 고추(15샘플)와 감염된 고추(15샘플)를 분류하는 Case 1 모델의 비감염개체와 감염개체의 평균, 표준편차 및 샘플 수를 Table 4에 나타내었다. Case 1의 t-test를 진행한 결과, NDVI에서만 유의미한 차이가 나타났고 나머지 식생지수에서는 유의미한 차이가 나타나지 않았다. NDVI에서 비감염샘플이 감염샘플보다 유의미하게 큰 것으로 나타났다.

Table 4.

Basic statistics of Case 1 after outlier elimination.

VIs	Mean ± S.D
VIs	Non-infected	Infected
NDVI	0.917 ± 0.004a¹ (15)	0.912 ± 0.008b (13)
GNDVI	0.805 ± 0.006 (14)	0.801 ± 0.010 (12)
GRVI	9.278 ± 0.334 (14)	9.098 ± 0.527 (12)
RVI	23.24 ± 1.176 (15)	21.86 ± 2.141 (13)
DVI	0.569 ± 0.023 (15)	0.549 ± 0.022 (13)
NDRE	0.498 ± 0.009 (14)	0.490 ± 0.015 (13)
PRI	-0.498 ± 0.010 (14)	-0.488 ± 0.012 (13)
OSAVI	0.846 ± 0.009 (15)	0.836 ± 0.013 (13)
TCARI	-0.107 ± 0.024 (13)	-0.089 ± 0.038 (12)

a, b: significantly different (p ＜ 0.05).

¹: significantly greater than the other (p ＜ 0.05).

Case 1의 KNN, SVM, LR모델 중 가장 높은 성능을 나타낸 각 모델의 결과를 Table 5에 나타내었다.

Table 5.

Results of each classification model in Case 1.

Number of samples	Non-infected = 13ea, Infected = 12ea
Model	KNN	SVM	LR
Data set ratio	7:3	7:3	7:3
Independent variables	PRI, OSAVI, DVI, NDVI, RVI, TCARI, NDRE, GNDVI	PRI, OSAVI, DVI, NDVI, RVI	PRI, OSAVI, DVI, NDVI, RVI, TCARI, NDRE, GNDVI, GRVI
Accuracy_C	0.765	0.824	0.706
Precision_C	0.778	0.667	0.778
Recall_C	0.778	1.000	0.700
F1-score_C	0.778	0.800	0.737
Accuracy_V	0.750	0.625	0.500
Precision_V	1.000	0.333	0.333
Recall_V	0.600	0.500	0.333
F1-score_V	0.750	0.400	0.333

KNN에서 validation의 accuracy와 recall이 가장 높은 값을 나타내어 PRI, OSAVI, DVI, NDVI, RVI, TCARI, NDRE, GNDVI를 독립변수로 사용한 7:3 비율의 KNN모델이 calibration에서 accuracy = 0.765, precision = 0.778, recall = 0.778, F1-score = 0.778, validation에서 accuracy = 0.750, precision = 1.000, recall = 0.600, F1-score = 0.750의 성능으로 가장 좋은 모델로 판단하였다. KNN모델의 validation confusion matrix, P-R curve, ROC curve를 Fig. 3에 나타내었다.

https://cdn.apub.kr/journalsite/sites/kspa/2024-006-04/N0570060401/images/kspa_2024_064_238_F3.jpg

Fig. 3.

(a) validation confusion matrix, (b) P-R curve, (c) ROC curve of the best model in Case 1

Validation confusion matrix를 확인한 결과, 방제에서 중요한 지표인 실제로 병에 걸렸지만 병에 걸리지 않았다고 판단한 데이터인 False Negative(FN)가 2개 존재하여 0.600의 낮은 recall 값을 나타내었다. Case 1 모델의 validation에서 P-R curve의 Area Under Curve(AUC)가 0.600의 면적을 나타낸 것으로 보아 감염개체(양성)의 탐지에 대한 검증 성능이 낮고 ROC curve의 AUC는 calibration과 validation에서 각각 0.833, 0.867의 면적을 나타낸 것으로 보아 비감염개체(음성) 클래스에 대한 탐지 성능은 상대적으로 높은 것으로 판단된다.

Case 2에 대한 분류 모델

Case 1에 7월 27일 영상의 식생지수로 8월25일에 감염되지 않은 고추(4샘플)와 추가로 감염된 고추(11샘플)를 추가한 Case 2의 비감염개체와 감염개체의 평균, 표준편차, 이상치를 제거한 샘플 수를 Table 6에 나타내었다. Case 2의 t-test를 진행한 결과, 모든 식생지수에서 유의미한 차이를 나타내었으며, NDVI, GNDVI, GRVI, RVI, DVI, NDRE, OSAVI에서 비감염개체가 감염개체보다, PRI, TCARI에서 감염개체가 비감염개체보다 유의미하게 더 크다고 나타났다.

Table 6.

Basic statistics of Case 2 after outlier elimination.

VIs	Mean ± S.D
VIs	Non-infected	Infected
NDVI	0.900 ± 0.033a¹ (19)	0.871 ± 0.047b (24)
GNDVI	0.791 ± 0.028a¹ (18)	0.768 ± 0.041b (21)
GRVI	8.713 ± 1.106a¹ (18)	7.890 ± 1.482b (21)
RVI	20.73 ± 4.986a¹ (19)	16.61 ± 5.989b (24)
DVI	0.526 ± 0.086a¹ (19)	0.457 ± 0.103b (24)
NDRE	0.490 ± 0.019a¹ (18)	0.471 ± 0.027b (23)
PRI	-0.460 ± 0.071b (18)	-0.422 ± 0.081a¹ (22)
OSAVI	0.817 ± 0.056a¹ (19)	0.769 ± 0.074b (24)
TCARI	-0.054 ± 0.097b (17)	0.000 ± 0.108a¹ (22)

a, b: significantly different (p ＜ 0.05).

¹: significantly greater than the other (p ＜ 0.05).

Case 2에서 작성된 각 KNN, SVM, LR모델 중 가장 높은 성능을 나타낸 모델의 결과를 Table 7에 나타내었다.

Table 7.

Results of each classification model in Case 2.

Number of samples	Non-infected = 17ea, Infected = 21ea
Model	KNN	SVM	LR
Data set ratio	8:2	8:2	8:2
Independent variables	NDRE, GNDVI	NDRE, GNDVI, DVI, NDVI, OSAVI	NDRE
Accuracy_C	0.724	0.655	0.655
Precision_C	0.700	0.688	0.714
Recall_C	0.875	0.688	0.625
F1-score_C	0.778	0.688	0.667
Accuracy_V	0.778	0.778	0.667
Precision_V	0.800	0.714	0.750
Recall_V	0.800	1.000	0.600
F1-score_V	0.800	0.833	0.667

KNN과 SVM의 validation accuracy는 동일하였지만 SVM이 더 높은 validation recall이 값을 나타내어 NDRE, GNDVI, DVI, NDVI, OSAVI를 독립변수로 사용한 8:2 비율의 SVM모델이 calibration에서 accuracy = 0.655, precision = 0.688, recall = 0.688, F1-score = 0.688, validation에서 accuracy = 0.778, precision = 0.714, recall = 1.000, F1-score = 0.833의 성능으로 가장 좋은 모델로 판단하였다. SVM모델의 validation confusion matrix, P-R curve, ROC curve를 Fig. 4에 나타내었다.

https://cdn.apub.kr/journalsite/sites/kspa/2024-006-04/N0570060401/images/kspa_2024_064_238_F4.jpg

Fig. 4.

(a) validation confusion matrix, (b) P-R curve, (c) ROC curve of the best model in Case 2.

Validation confusion matrix를 확인한 결과, 실제로 병에 걸렸지만 병에 걸리지 않았다고 판단한 데이터인 FN이 0개로 1.000의 높은 recall 값을 나타내었다. 그러나 P-R curve의 AUC가 calibration과 validation에서 각각 0.486와 0.478, ROC curve에서는 0.322와 0.250의 좁은 면적을 나타내었다. 이는 양성(감염) 클래스는 모두 예측한 반면, 음성(비감염) 클래스는 절반만 예측하여 클래스에 따른 분류 성능의 불균형이 발생하였기 때문으로 판단된다.

Case 3에 대한 분류 모델

Case 2에서 감염되지 않은 데이터를 제외한 Case 3 모델을 작성하였고, 비감염개체와 감염개체의 평균, 표준편차, 이상치를 제거한 샘플 수를 Table 8에 나타내었다. Case 3의 t-test를 진행한 결과, Case 2와 동일하게 모든 식생지수에서 유의미한 차이를 나타내었으며, NDVI, GNDVI, GRVI, RVI, DVI, NDRE, OSAVI에서 비감염개체가 감염개체보다, PRI, TCARI에서 감염개체가 비감염개체보다 유의미하게 더 크다고 나타났다.

Table 8.

Basic statistics of Case 3 after outlier elimination.

VIs	Mean ± S.D
VIs	Non-infected	Infected
NDVI	0.917 ± 0.004a¹ (15)	0.871 ± 0.047b (24)
GNDVI	0.805 ± 0.006a¹ (14)	0.768 ± 0.041b (21)
GRVI	9.278 ± 0.334a¹ (14)	7.890 ± 1.482b (21)
RVI	23.24 ± 1.176a¹ (15)	16.61 ± 5.989b (24)
DVI	0.569 ± 0.023a¹ (15)	0.457 ± 0.103b (24)
NDRE	0.498 ± 0.009a¹ (14)	0.471 ± 0.027b (23)
PRI	-0.498 ± 0.010b (14)	-0.422 ± 0.081a¹ (22)
OSAVI	0.846 ± 0.009a¹ (15)	0.769 ± 0.074b (24)
TCARI	-0.107 ± 0.024b (13)	0.000 ± 0.108a¹ (22)

a, b: significantly different (p ＜ 0.05).

¹: significantly greater than the other (p ＜ 0.05).

Case 3에서 작성된 각 KNN, SVM, LR모델 중 가장 높은 성능을 나타낸 모델의 결과를 Table 9에 나타내었다.

Table 9.

Results of each classification model in Case 3.

Number of samples	Non-infected = 13ea, Infected = 21ea
Model	KNN	SVM	LR
Data set ratio	7:3	7:3	7:3
Independent variables	DVI, TCARI, RVI	DVI, TCARI	DVI, TCARI, RVI, PRI
Accuracy_C	0.783	0.870	0.800
Precision_C	0.733	0.800	0.846
Recall_C	0.917	1.000	0.846
F1-score_C	0.815	0.889	0.846
Accuracy_V	0.909	0.727	0.643
Precision_V	0.833	0.667	0.625
Recall_V	1.000	0.800	0.714
F1-score_V	0.909	0.727	0.667

SVM에서 validation의 accuracy와 recall이 가장 높은 값을 나타내어 DVI, TCARI, RVI를 독립변수로 사용한 7:3 비율의 KNN모델이 calibration에서 accuracy = 0.783, precision = 0.733, recall = 0.917, F1-score = 0.815, validation에서 accuracy = 0.909, precision = 0.833, recall = 1.000, F1-score = 0.909의 성능으로 가장 좋은 모델로 판단하였다. KNN모델의 validation confusion matrix, P-R curve, ROC curve를 Fig. 5에 나타내었다.

https://cdn.apub.kr/journalsite/sites/kspa/2024-006-04/N0570060401/images/kspa_2024_064_238_F5.jpg

Fig. 5.

(a) validation confusion matrix, (b) P-R curve, (c) ROC curve of the best model in Case 3.

Validation confusion matrix를 확인한 결과, 실제로 병에 걸렸지만 병에 걸리지 않았다고 판단한 데이터인 FN이 0개로 1.000의 높은 recall 값을 나타내어 해당 모델이 실제 방제에 가장 적합할 것으로 사료된다. P-R curve의 AUC가 calibration과 validation에서 각각 0.846와 0.924, ROC curve에서는 0.883와 0.917의 면적을 나타내어 양성(감염)과 음성(비감염) 클래스 모두에 대해서 높은 탐지 성능을 나타내는 것으로 판단된다.

고추 시들음병 분류 모델에 대한 고찰

시들음병 감염여부에 따른 t-test에서 Case 2와 Case 3의 가장 좋은 모델에 사용된 식생지수의 p-value를 비교하였을 때 Case 2(NDRE; p = 0.043, GNDVI; p = 0.031, DVI; p = 0.006, NDVI; p = 0.007, OSAVI; p = 0.019)보다 Case 3(DVI; p = 0.0004, TCARI; p = 0.002, RVI; p = 0.002)에서 사용된 식생지수가 전체적으로 더 낮은 p-value를 나타냈다. 이는 7월 27일 영상의 데이터를 모두 추가한 Case 2보다 8월 25일에 감염되지 않은 정상개체를 제외한 Case 3에서 시들음병 감염 여부에 따른 식생지수의 차이가 통계적으로 더 유의미하다는 것을 나타낸다. 이러한 데이터의 차이로 인해 Case 2 모델(Accuracy_C = 0.655, Precision_C = 0.688, Recall_C = 0.688, F1-score_C=0.688, Accuracy_V = 0.778, Precision_V = 0.714, Recall_V = 1.000, F1-score_V=0.833)보다 Case 3 모델(Accuracy_C = 0.783, Precision_C = 0.733, Recall_C = 0.917, F1-score_C=0.815, Accuracy_V = 0.909, Precision_V = 0.833, Recall_V = 1.000, F1-score_V=0.909)이 더 높은 성능을 나타낸 것으로 판단된다.

세 가지 머신러닝 분류 알고리즘 중 KNN이 가장 높은 성능을 나타낸 것은 본 연구에 사용된 데이터가 40개 미만으로 적었으며, KNN이 데이터의 복잡한 구조를 학습할 필요 없이 인접한 샘플 간의 거리를 기반으로 단순한 방식으로 분류를 수행하여 적은 데이터의 분석에 더 유리하기 때문인 것으로 판단된다. 따라서 시들음병의 조기탐지 가능성에 대한 체계적인 연구를 위해서는 추가적인 데이터 확보를 기반으로 한 다양한 머신러닝 기법의 적용과 시계열 및 다년간의 영상 데이터의 수집이 필요하다고 판단된다. 또한 본 연구에 사용된 데이터의 샘플 수가 제한적이고, 단순한 훈련과 검증의 데이터 분할 방법으로 인해 과적합의 가능성이 있다. 따라서 후속 연구에서는 현장조사를 통해 시들음병 감염여부를 판단한 데이터뿐만 아니라 시계열 영상을 기반으로 생육지수의 변화를 이용한 병의 감염여부를 판단한 데이터를 다수 추가하여 모델을 개발 및 개선한 후 k-fold, leave one out 등의 다양한 교차 검증을 수행함으로써 일반화 성능을 검증할 예정이다.

Conclusion

본 연구는 다중분광 영상으로부터 추출된 반사값을 바탕으로 식생지수를 산출하고, 이를 머신러닝 분류 알고리즘에 적용하여 시들음병의 발생여부를 조기에 탐지하는 모델을 작성하여 비교 및 분석하였다.

6월 28일 영상의 식생지수로 7월 27일에 감염되지 않은 고추(15샘플)와 감염된 고추(15샘플)를 분류하는 Case 1에서는 PRI, OSAVI, DVI, NDVI, NDVI, RVI, TCARI, NDRE, GNDVI을 독립변수로 사용한 7:3 비율의 KNN모델이 calibration에서 accuracy = 0.765, precision = 0.778, recall = 0.778, F1-score = 0.778, validation에서 accuracy = 0.750, precision = 1.000, recall = 0.600, F1-score = 0.750의 성능을 나타내어 가장 좋은 모델로 선택되었으며, validation confusion matrix를 확인한 결과 방제에서 중요한 지표인 실제로 병에 걸렸지만 병에 걸리지 않았다고 판단한 데이터인 false negative(FN)가 2개 존재하였다. P-R curve의 AUC가 Calibration과 validation에서 각각 0.723와 0.600, ROC curve에서는 각각 0.833와 0.867의 면적을 나타내었다.

Case 1에 7월 27일 영상의 식생지수로 8월25일에 감염되지 않은 고추(4샘플)와 추가로 감염된 고추(11샘플)를 추가한 Case 2에서는 NDRE, GNDVI, DVI, NDVI, OSAVI를 독립변수로 사용한 8:2 비율의 SVM모델이 calibration에서 accuracy = 0.655, precision = 0.688, recall = 0.688, F1-score = 0.688, validation에서 accuracy = 0.778, precision = 0.714, recall = 1.000, F1-score = 0.833의 성능을 나타내어 가장 좋은 모델로 선택되었으며, validation confusion matrix를 확인하였을 때 FN이 존재하지 않아 1.000의 recall 값을 나타내었다. P-R curve의 AUC가 Calibration과 validation에서 각각 0.486와 0.478, ROC curve에서는 각각 0.322와 0.250의 면적을 나타내었다.

Case 2에서 감염되지 않은 데이터를 제외한 Case 3에서는 DVI, TCARI, RVI를 독립변수로 사용한 7:3 비율의 KNN모델이 calibration에서 accuracy = 0.783, precision = 0.733, recall = 0.917, F1-score = 0.815, validation에서 accuracy = 0.909, precision = 0.833, recall = 1.000, F1-score = 0.909의 성능을 나타내어 가장 좋은 모델로 선택되었으며, validation confusion matrix를 확인하였을 때 FN이 존재하지 않아 1.000의 recall 값을 나타내었다. P-R curve의 AUC가 calibration과 validation에서 각각 0.846와 0.924, ROC curve에서는 각각 0.883와 0.917의 면적을 나타내었다.

세 가지 경우 중에 Case 3의 KNN모델이 validation 가장 높은 accuracy와 recall을 나타내었고 validation의 confusion matrix에서 FN이 0개이며, P-R curve와 ROC curve에서 모두 가장 넓은 AUC를 나타내었기 때문에 실제 시들음병 탐지 및 방제에서 가장 유용할 것이라 사료된다.

본 연구에서 개발된 모델은 실제 농가에서 고추 시들음병을 조기에 발견하여 질병 확산을 효과적으로 억제하고 피해를 최소화함으로써 고추의 생산량 증대와 농가 소득 향상에 기여할 수 있고, 불필요한 농약 사용을 줄임으로써 환경 친화적 농업 실현과 생산 비용 절감을 도모할 수 있을 것으로 사료된다. 또한, 데이터의 추가를 통해 모델 성능이 향상된 것을 확인하였으므로, 앞으로 더 많은 고추 시들음병 데이터의 수집을 통해 모델의 성능을 더욱 개선할 수 있을 것으로 기대된다.

Conflict of Interests

All authors declare there is no conflict of interest.

Acknowledgements

이 연구는 2024년도 산업통상자원부 및 산업기술기획평가원(KEIT)연구비 지원에 의한 연구임(‘20018635’).

References

Ahmed, K., Shahidi, T.R., Alam, S.M.I., Momen, S. 2019. Rice leaf disease detection using machine learning techniques. Paper presented at the 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI), 1-5. https://doi.org/10.1109/STI47673.2019.9068096

10.1109/STI47673.2019.9068096

Avola, G., Di Gennaro, S.F., Cantini, C., Riggi, E., Muratore, F., Tornambè, C., Matese, A. 2019. Remotely sensed vegetation indices to discriminate field-grown olive cultivars. Remote Sensing 11(10): 1242. https://doi.org/10.3390/rs11101242

10.3390/rs11101242

Basso, B., Cammarano, D., De Vita, P. 2004. Remotely sensed vegetation indices: Theory and applications for crop management. Rivista Italiana Di Agrometeorologia 1(5): 36-53.

Boiarskii, B., Hasegawa, H. 2019. Comparison of NDVI and NDRE indices to detect differences in vegetation and chlorophyll content. Journal of Mechanics of Continua and Mathematical Sciences 4: 20-29. https://doi.org/10.26782/jmcms.spl.4/2019.11.00003

10.26782/jmcms.spl.4/2019.11.00003

Brereton, R.G., Lloyd, G.R. 2010. Support vector machines for classification and regression. Analyst 135(2): 230-267. https://doi.org/10.1039/B918972F

10.1039/B918972F20098757

Engalycheva, I., Kozar, E., Frolova, S., Vetrova, S., Tikhonova, T., Dzhos, E., Engaluchev, M., Chizhik, V., Martynov, V., Shingaliev, A., Dudnikova, K., Dudnikova, M., Kostanchuk, Y. 2024. Fusarium species causing pepper wilt in russia: Molecular identification and pathogenicity. Microorganisms 12(2): 343. https://doi.org/10.3390/microorganisms12020343

10.3390/microorganisms1202034338399747PMC10893445

Gabrekiristos, E., Demiyo, T. 2020. Hot pepper fusarium wilt (fusarium oxysporum f. sp. capsici): Epidemics, characteristic features and management options. Journal of Agricultural Science 12(10): 347-360. https://doi.org/10.5539/jas.v12n10p347

10.5539/jas.v12n10p347

Haque, I., Islam, M.A., Roy, K., Rahaman, M.M., Shohan, A.A., Islam, M.S. 2022. Classifying pepper disease based on transfer learning: A deep learning approach. Paper presented at the 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), 620-629. https://doi.org/10.1109/ICAAIC53929.2022.9793178

10.1109/ICAAIC53929.2022.9793178

Huang, S., Tang, L., Hupy, J.P., Wang, Y., Shao, G. 2021. A commentary review on the use of normalized difference vegetation index (NDVI) in the era of popular remote sensing. Journal of Forestry Research 32(1): 1-6. https://doi.org/10.1007/s11676-020-01155-1

10.1007/s11676-020-01155-1

Hunt, E.R., Hively, W.D., Daughtry, C.S., McCarty, G.W., Fujikawa, S.J., Ng, T.L., Tranchitella, M., Linden, D.S., Yoel, D.W. 2008. Remote sensing of crop leaf area index using unmanned airborne vehicles. Paper presented at the Proceedings of the Pecora 17: 18-20.

Jung, M., Song, J., Shin, A., Choi, B., Go, S., Kwon, S., Park, J., Park. S., Kim, Y. 2023. Construction of deep learning-based disease detection model in plants. Scientific Reports 13(1): 7331. https://doi.org/10.1038/s41598-023-34549-2

10.1038/s41598-023-34549-237147432PMC10163233

Kang, Y., Nam, J., Kim, Y., Lee, S., Seong, D., Jang, S., Ryu, C. 2021. Assessment of regression models for predicting rice yield and protein content using unmanned aerial vehicle-based multispectral imagery. Remote Sensing 13(8): 1508. https://doi.org/10.3390/rs13081508

10.3390/rs13081508

Kerkech, M., Hafiane, A., Canals, R. 2020. Vine disease detection in UAV multispectral images using optimized image registration and deep learning segmentation approach. Computers and Electronics in Agriculture 174: 105446. https://doi.org/10.1016/j.compag.2020.105446

10.1016/j.compag.2020.105446

Khurshid, H., Khan, M. F. 2014. Segmentation and classification using logistic regression in remote sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8(1): 224-232. https://doi.org/10.1109/JSTARS.2014.2362769

10.1109/JSTARS.2014.2362769

Laaksonen, J., Oja, E. 1996. Classification with learning k-nearest neighbors. Paper presented at the Proceedings of International Conference on Neural Networks (ICNN'96) 3: 1480-1483. https://doi.org/10.1109/ICNN.1996.549118

10.1109/ICNN.1996.549118

Lee, G., Hwang, J., Cho, S. 2021. A novel index to detect vegetation in urban areas using UAV-based multispectral images. Applied Sciences 11(8): 3472. https://doi.org/10.3390/app11083472

10.3390/app11083472

Panigrahi, K.P., Das, H., Sahoo, A.K., Moharana, S.C. 2020. Maize leaf disease detection and classification using machine learning algorithms. Paper presented at the Progress in Computing, Analytics and Networking: Proceedings of ICCAN 2019, 659-669. https://doi.org/10.1007/978-981-15-2414-1_66

10.1007/978-981-15-2414-1_66

Peng, Y., Dallas, M.M., Ascencio-Ibáñez, J.T., Hoyer, J.S., Legg, J., Hanley-Bowdoin, L., Grieve, B., Yin, H. 2022. Early detection of plant virus infection using multispectral imaging and spatial-spectral machine learning. Scientific Reports 12(1): 3113. https://doi.org/10.1038/s41598-022-06372-8

10.1038/s41598-022-06372-835210452PMC8873445

Ramesh, S., Hebbar, R., Niveditha, M., Pooja, R., Shashank, N., Vinod, P.V. 2018. Plant disease detection using machine learning. Paper presented at the 2018 International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C), 41-45. https://doi.org/10.1109/ICDI3C.2018.00017

10.1109/ICDI3C.2018.0001729298986PMC5752679

RDA (Rural Development Administration). 2020. Chili Peppers - Agricultural Technology Guide 115 (Revised Edition). [in Korean]

Saito, T., Rehmsmeier, M. 2015. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10(3): e0118432. https://doi.org/10.1371/journal.pone.0118432

10.1371/journal.pone.011843225738806PMC4349800

Schober, P., Boer, C., Schwarte, L.A. 2018. Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia 126(5): 1763-1768. https://doi.org/10.1213/ane.0000000000002864

10.1213/ANE.000000000000286429481436

Sun, X., Yang, Z., Su, P., Wei, K., Wang, Z., Yang, C., Wang, C., Qin, M., Xiao, L., Yang, W., Zhang, M., Song, X., Feng, M. 2023. Non-destructive monitoring of maize LAI by fusing UAV spectral and textural features. Frontiers in Plant Science 14: 1158837. https://doi.org/10.3389/fpls.2023.1158837

10.3389/fpls.2023.115883737063231PMC10102429

Yang, J., Rahardja, S., Fränti, P. 2019. Outlier detection: How to threshold outlier scores? Paper presented at the Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing, 1-6. https://doi.org/10.1145/3371425.3371427

10.1145/3371425.3371427

Yu, R., Luo, Y., Zhou, Q., Zhang, X., Wu, D., Ren, L. 2021. Early detection of pine wilt disease using deep learning algorithms and UAV-based multispectral imagery. Forest Ecology and Management 497: 119493. https://doi.org/10.1016/j.foreco.2021.119493

10.1016/j.foreco.2021.119493

Zhang, L., Zhang, H., Niu, Y., Han, W. 2019. Mapping maize water stress based on UAV multispectral remote sensing. Remote Sensing 11(6): 605. https://doi.org/10.3390/rs11060605

10.3390/rs11060605

Zhang, S., Li, X., Ba, Y., Lyu, X., Zhang, M., Li, M. 2022. Banana fusarium wilt disease detection by supervised and unsupervised methods from UAV-based multispectral imagery. Remote Sensing 14(5): 1231. https://doi.org/10.3390/rs14051231

10.3390/rs14051231

Precision Agriculture Science and Technology ISSN:2672-0086(Print) 2713-5632(Online) 정밀농업과학기술

Preview

Early detection of Fusarium wilt in Pepper using multispectral images based on UAV

ABSTRACT

MAIN

Fig. 1.

(a) Sampling location, (b) Location of the infected samples.

Table 1.

Multispectral Sensor.

(1)

Fig. 2.

Vegetation separation.

Table 2.

Vegetation Indices (VIs).

Table 3.

Sampling numbers of each case.

(2)

(3)

(4)

(5)

Table 4.

Basic statistics of Case 1 after outlier elimination.

Table 5.

Results of each classification model in Case 1.

Fig. 3.

(a) validation confusion matrix, (b) P-R curve, (c) ROC curve of the best model in Case 1

Table 6.

Basic statistics of Case 2 after outlier elimination.

Table 7.

Results of each classification model in Case 2.

Fig. 4.

(a) validation confusion matrix, (b) P-R curve, (c) ROC curve of the best model in Case 2.

Table 8.

Basic statistics of Case 3 after outlier elimination.

Table 9.

Results of each classification model in Case 3.

Fig. 5.

(a) validation confusion matrix, (b) P-R curve, (c) ROC curve of the best model in Case 3.

Conflict of Interests

Acknowledgements

References