Open Plant Species Recognition using Vision Transformer Network and Top-K Logit Disparity Score

Gusti Ahmad Fanshuri Alfarisy; Kassim Kalinaki; Owais Ahmed Malik; Rizal Kusuma Putra; Aninditya Anggari Nuryono

doi:10.35718/iiair.v1i1.1227

Authors

Gusti Ahmad Fanshuri Alfarisy Institut Teknologi Kalimantan https://orcid.org/0000-0002-0689-7002
Kassim Kalinaki Islamic University in Uganda https://orcid.org/0000-0001-8630-9110
Owais Ahmed Malik Universiti Brunei Darussalam https://orcid.org/0000-0002-4888-5448
Rizal Kusuma Putra Institut Teknologi Kalimantan https://orcid.org/0000-0002-7855-2580
Aninditya Anggari Nuryono Institut Teknologi Kalimantan https://orcid.org/0000-0001-8841-6585

DOI:

https://doi.org/10.35718/iiair.v1i1.1227

Keywords:

plant species identification, open-set recognition, out-of-distribution detection, deep learning, machine learning

Abstract

Reliable plant species identification is essential for biodiversity conservation, agriculture, and ecological research. However, current plant species recognition systems often struggle with the rejection of unknown classes, which limits their applicability in real-world scenarios. Typically, the maximum probability score is used to reject unknown classes, relying solely on the highest output while neglecting the significance of other output scores, which may restrict the model's potential. In this research, we propose a novel scoring function named the Top-K Logit Disparity Score (TKLDS) for open-set plant species recognition using a Vision Transformer (ViT) network. We conducted extensive experiments on the VNPLANT200 dataset consisting of 200 plant species, where the ViT-L/16 model achieved the highest accuracy in closed-set recognition and the highest Area Under the Receiver Operating Characteristic curve (AUROC) between known and unknown classes compared to other state-of-the-art models, such as ResNet, ConvNeXt, Swin Transformer, and MaxViT. Our results indicate that tuning the parameter k in TKLDS consistently improved the arithmetic mean of closed-set accuracy and AUROC across all pre-trained models. Notably, larger values of k generally led to better performance, with the ViT-L/16 model yielding an arithmetic mean score of 0.975 ± 0.005 for k = 4 with 5 combinations. These findings demonstrate the potential of TKLDS as a robust scoring function for open-set recognition tasks, highlighting its effectiveness in improving performance metrics in plant species identification.

References

A. K. Verma, P. R. Rout, E. Lee, P. Bhunia, J. Bae, R. Y. Suram palli, T. C. Zhang, R. D. Tyagi, P. Lin, and Y. Chen, “Biodiver sity and Sustainability,” in Sustainability, 1st ed., R. Surampalli, T. Zhang, M. K. Goyal, S. Brar, and R. Tyagi, Eds. Wiley, May 2020, pp. 255–275, doi: 10.1002/9781119434016.ch12.

SIDA, “Urban Development: Biodiversity and Ecosystems,” Swedish International Development Cooperation Agency (SIDA), 2016. [Online]. Available: https://cdn.sida.se/publications/files/ sida62003en-urban-development-biodiversity-and-ecosystems.pdf

A. Opoku, “Biodiversity and the built environment: Implications for the sustainable development goals (sdgs),” Resources, onservation and Recycling, vol. 141, pp. 1–7, 2019, doi: 10.1016/j.resconrec.2018.10.011.

M. R. Marselle, S. J. Lindley, P. A. Cook, and A. Bonn, “Biodiversity and Health in the Urban Environment,” Current Environmental Health Reports, vol. 8, no. 2, pp. 146–156, 2021, doi: 10.1007/s40572-021-00313-9.

G. A. W. Rook, “Regulation of the immune system by biodiversity from the natural environment: An ecosystem service essential to health,” Proceedings of the National Academy of Sciences, vol. 110, pp. 18 360 – 18 367, 2013, doi:10.1073/pnas.1313731110 .

J. Methorst, A. Bonn, M. Marselle, K. Böhning-Gaese, and K. Rehdanz, “Species richness is positively related to mental health – a study for germany,” Landscape and Urban Planning, vol. 211, p. 104084, 2021, doi: 10.1016/j.landurbplan.2021.104084.

J. Wäldchen and P. Mäder, “Plant Species Identification Using Computer Vision Techniques: A Systematic Literature Review,”

Archives of Computational Methods in Engineering, vol. 25, no. 2, pp. 507–543, Apr. 2018, doi: 10.1007/s11831-016-9206-z.

K. J. Gaston and M. A. O’Neill, “Automated species identification: why not?” Philosophical transactions of the Royal Society of London. Series B, Biological sciences, vol. 359 1444, pp. 655–67, 2004, doi: 10.1098/rstb.2003.1442.

E. Dönmez, “Hybrid convolutional neural network and multilayer perceptron vision transformer model for wheat species classification task: E-ResMLP+,” European Food Research and Technology, vol. 250, no. 5, pp. 1379–1388, May 2024, doi: 10.1007/s00217-024-04469-0.

A. P. Sundara Sobitha Raj and S. K. Vajravelu, “DDLA: dual deep learning architecture for classification of plant species,” IET Image Processing, vol. 13, no. 12, pp. 2176–2182, Oct. 2019, doi: 10.1049/iet-ipr.2019.0346.

A. Kaya, A. S. Keceli, C. Catal, H. Y. Yalic, H. Temucin, and B. Tekinerdogan, “Analysis of transfer learning for deep neural network based plant classification models,” Computers and Electronics in Agriculture, vol. 158, pp. 20–29, Mar. 2019, doi: 10.1016/j.compag.2019.01.041.

M. J. M. Christenhusz and J. W. Byng, “The number of known plants species in the world and its annual increase,” Phytotaxa, vol. 261, no. 3, pp. 201–217, May 2016, doi: 10.11646/phytotaxa.261.3.1

G. Chen, L. Qiao, Y. Shi, P. Peng, J. Li, T. Huang, S. Pu, and Y. Tian, “Learning open set network with discriminative reciprocal points,” in ECCV, 2020, doi: 10.1007/978-3-030-58580-8_30.

H.-M. Yang, X.-Y. Zhang, F. Yin, and C.-L. Liu, “Robust classification with convolutional prototype learning,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3474–3482, 2018, doi: 10.1109/CVPR.2018.00366.

G. A. F. Alfarisy, O. A. Malik, and O. W. Hong, “Quad-Channel Contrastive Prototype Networks for Open-Set Recognition in

Domain-Specific Tasks,” IEEE Access, vol. 11, pp. 48 578–48 592, 2023, doi: 10.1109/ACCESS.2023.3275743.

H. Pan, L. Xie, and Z. Wang, “Plant and animal species recognition based on dynamic vision transformer architecture,” Remote Sensing, vol. 14, no. 20, 2022, doi: 10.3390/rs14205242.

N. V. Hieu, N. L. H. Hien, L. V. Huy, N. H. Tuong, and P. T. K. Thoa, “Plantkvit: A combination model of vision transformer and knn for forest plants classification,” JUCS - Journal of Universal Computer Science, vol. 29, no. 9, pp. 1069–1089, 2023, doi: 10.3897/jucs.94657.

C. P. Lee, K. M. Lim, Y. X. Song, and A. Alqahtani, “Plant-cnn-vit: Plant classification with ensemble of convolutional neural networks and vision transformer,” Plants, vol. 12, no. 14, 2023, doi: 10.3390/plants12142642.

M. Gustineli, A. Miyaguchi, and I. Stalter, “Multi-label plant species classification with self-supervised vision transformers,” 2024. [Online]. Available: https://arxiv.org/abs/2407.06298

D. T. N. Nhut, T. D. Tan, T. N. Quoc, and V. T. Hoang, “Medicinal plant recognition based on vision transformer and beit,” Procedia Computer Science, vol. 234, pp. 188–195, 2024, doi: 10.1016/j.procs.2024.02.165.

Open-set Plant Identification Using an Ensemble of Deep Convolutional Neural Networks, 2016. [Online]. Available: https://research.sabanciuniv.edu/id/eprint/29408/

T. Fang, Z. Li, J. Zhang, D. Qi, and L. Zhang, “Open-Set Recognition of Wood Species Based on Deep Learning Feature Extraction Using Leaves,” Journal of Imaging, vol. 9, no. 8, p.154, Aug. 2023, doi: 10.3390/jimaging9080154.

Y. Meng, M. Xu, H. Kim, S. Yoon, Y. Jeong, and D. S. Park, “Known and unknown class recognition on plant species and diseases,” Computers and Electronics in Agriculture, vol. 215, p. 108408, Dec. 2023, doi: 10.1016/j.compag.2023.108408.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, doi: 10.1109/CVPR.2016.90.

Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 11 976–11 986, doi: 10.1109/CVPR52688.2022.01167.

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 10 012–10 022, doi: 10.1109/ICCV48922.2021.00986.

Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, and B. Guo, “Swin transformer v2: Scaling up capacity and resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 12 009–12 019, doi: 10.1109/CVPR52688.2022.01170.

Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. C. Bovik, and Y. Li, “Maxvit: Multi-axis vision transformer,” in European Conference on Computer Vision, 2022, doi: 10.1007/978-3-031-20053-3_27.

J. Jang and C. O. Kim, “Teacher-Explorer-Student Learning: A Novel Learning Method for Open Set Recognition,” Mar. 2021, doi: 10.1109/TNNLS.2023.3336799.

H.-M. Yang, X.-Y. Zhang, F. Yin, and C.-L. Liu, “Robust Classification with Convolutional Prototype Learning,” in 2018

IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3474–3482, doi: 10.1109/CVPR.2018.00366.

Y. Shu, Y. Shi, Y. Wang, T. Huang, and Y. Tian, “P-ODN: Prototype-based Open Deep Network for Open Set Recognition,” Scientific Reports, vol. 10, no. 1, p. 7146, Dec. 2020, doi: 10.1038/s41598-020-63649-6.

Open Plant Species Recognition using Vision Transformer Network and Top-K Logit Disparity Score

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Categories

License

Most read articles by the same author(s)

Similar Articles

Make a Submission

Template

Information

Tutorial

visitor