Journal of Zhejiang University

Journal of Zhejiang University SCIENCE B 2026 Vol.27 No.5 P.466-481

Embedding of ripening topology into one-stage detection for tomato cluster phenotyping

Author(s): Bingquan CHU, Ruiyuan WU, Haijun ZHANG, Haochuan QIN, Zishun PENG, Fengle ZHU, Yong HE
Affiliation(s): 1. School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China more
Corresponding email(s): zhufl@zjut.edu.cn, yhe@zju.edu.cn
Key Words: Tomato ripeness, Phenotype, Object detection, Topology, You Only Look Once (YOLO), Spatial sequence

Share this article to： More <<< Previous Article \|Next Article >>>

Bingquan CHU, Ruiyuan WU, Haijun ZHANG, Haochuan QIN, Zishun PENG, Fengle ZHU, Yong HE. Embedding of ripening topology into one-stage detection for tomato cluster phenotyping[J]. Journal of Zhejiang University Science B, 2026, 27(5): 466-481.

@article{title="Embedding of ripening topology into one-stage detection for tomato cluster phenotyping",
author="Bingquan CHU, Ruiyuan WU, Haijun ZHANG, Haochuan QIN, Zishun PENG, Fengle ZHU, Yong HE",
journal="Journal of Zhejiang University Science B",
volume="27",
number="5",
pages="466-481",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.B2500647"
}

%0 Journal Article
%T Embedding of ripening topology into one-stage detection for tomato cluster phenotyping
%A Bingquan CHU
%A Ruiyuan WU
%A Haijun ZHANG
%A Haochuan QIN
%A Zishun PENG
%A Fengle ZHU
%A Yong HE
%J Journal of Zhejiang University SCIENCE B
%V 27
%N 5
%P 466-481
%@ 1673-1581
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.B2500647

TY - JOUR
T1 - Embedding of ripening topology into one-stage detection for tomato cluster phenotyping
A1 - Bingquan CHU
A1 - Ruiyuan WU
A1 - Haijun ZHANG
A1 - Haochuan QIN
A1 - Zishun PENG
A1 - Fengle ZHU
A1 - Yong HE
J0 - Journal of Zhejiang University Science B
VL - 27
IS - 5
SP - 466
EP - 481
%@ 1673-1581
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.B2500647

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: The automated assessment of tomato ripeness is vital for modern greenhouse operations, yet challenges remain due to variable environmental conditions. To provide a solution, we propose rank-aware you Only Look Once (YOLO), a novel detection framework that incorporates the biological prior of top-to-bottom ripening within fruit clusters. This is achieved through two key innovations: an efficient position-aware head for regressing relative height for fruits and a dynamic margin-aware ranking loss (DM-RankLoss) that enforces the correct spatial sequence. Evaluated on a 3500-image dataset from a solar greenhouse, our plug-and-play module could boost the mean average precision (mAP) at intersection over union (IoU) threshold of 0.50 (mAP₅₀) of multiple YOLO architectures by up to 5.66 pecentage points. The model effectively learns the cluster topology, achieving a height-mean absolute error (H-MAE) of 0.107 (normalized) and a pairwise ranking accuracy (PRA) of 84.59%, while it reduces the parameter count by over 10% compared to the baseline for efficient deployment. Visualizations confirm that the model leverages spatial context to resolve color ambiguities. Our work offers a sensor-free, accurate, and efficient solution for in situ phenotyping in agricultural robotics.

将成熟拓扑序列嵌入单阶段检测用于串番茄表型分析

楚秉泉¹，吴瑞源¹，张海军¹，秦浩川¹，彭子舜¹，朱逢乐²，何勇³
¹浙江科技大学生物与化学工程学院，中国杭州， 310023
²浙江工业大学机械工程学院，中国杭州， 310023
³浙江大学生物系统工程与食品科学学院，中国杭州， 310058
摘要：番茄成熟度的自动化评估对现代温室作业至关重要，但多变的环境条件为其准确实现带来了持续挑战。为此，本文提出了rank-aware YOLO，一种融合果实簇内自上而下成熟这一生物学先验知识的新型检测框架。该框架通过两项关键创新实现：（1）用于回归果实相对高度的高效位置敏感检测头（efficient position-aware head）；（2）修正空间序列的动态边距感知排序损失（DM-RankLoss）。在包含3500张温室采集图像的数据集上进行评估的结果表明，该模块具有良好的即插即用特性，能将多种YOLO架构的在交并比（IoU）阈值为0.50时的平均精度均值（mAP₅₀）最高提升5.66个百分点。模型有效学习到果实簇的拓扑结构，在归一化高度平均绝对误差（H-MAE）和成对排序准确率上分别达到0.107和84.59%，同时参数量较基线减少超过10%，具备高效部署潜力。可视化分析进一步证实，模型能利用空间上下文信息有效缓解颜色模糊带来的误判。综上，本研究为农业机器人中的原位表型分析提供了一种无需额外传感器的准确且高效的解决方案。

关键词：番茄成熟度；表型；目标检测；拓扑；YOLO；空间序列

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]AlbaharM, 2023. A survey on deep learning and its impact on agriculture: challenges and opportunities. Agriculture, 13(3):540.

[2]All China Federation of Supply and Marketing Cooperatives, 2021. Tomato, GH/T 1193-2021. All China Federation of Supply and Marketing Cooperatives, China.

[3]BanerjeeS, MageeL, WangDK, et al., 2020. Semantic segmentation of microscopic neuroanatomical data by combining topological priors with encoder‒decoder deep networks. Nat Mach Intell, 2(10):585-594.

[4]BurgesC, ShakedT, RenshawE, et al., 2005. Learning to rank using gradient descent. Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany, p.89-96.

[5]BurgesCJC, 2010. From RankNet to LambdaRank to LambdaMART: an overview. Microsoft Research Technical Report, MSR-TR-2010-82. Available from: https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview

[6]CaoWZ, MirjaliliV, RaschkaS, 2020. Rank consistent ordinal regression for neural networks with application to age estimation. Pattern Recogn Lett, 140:325-331.

[7]ChenWB, LiuMC, ZhaoCJ, et al., 2024. MTD-YOLO: multi-task deep convolutional neural network for cherry tomato fruit bunch maturity detection. Comput Electron Agric, 216:108533.

[8]ChenWJ, RaoY, WangFY, et al., 2024. MLP-based multimodal tomato detection in complex scenarios: insights from task-specific analysis of feature fusion architectures. Comput Electron Agric, 221:108951.

[9]ChuBQ, GuoZY, LiuBJ, et al., 2025. Fast detection of rice striped stem borer (Chilo suppressalis) stress based on UAV sensor and multimodal segmentation method. Plant Growth Regul, 105(4):1057-1071.

[10]DaoT, FuDY, ErmonS, et al., 2022. FlashAttention: fast and memory-efficient exact attention with IO-awareness. arXiv:2205.14135.

[11]DengJK, GuoJ, XueNN, et al., 2019. ArcFace: additive angular margin loss for deep face recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, p.4685-4694.

[12]DíazR, MaratheA, 2019. Soft labels for ordinal regression. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, p.4733-4742.

[13]FassE, ShlomiE, ZivC, et al., 2025. Machine learning models based on hyperspectral imaging for pre-harvest tomato fruit quality monitoring. Comput Electron Agric, 229:109788.

[14]Food and Agriculture Organization of the United Nations (FAO), 2023. Crops and livestock products. https://www.fao.org/faostat/en/#data/QCL [Accessed on Oct. 1, 2025].

[15]FuH, GongMM, WangCH, et al., 2018. Deep ordinal regression network for monocular depth estimation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, p.2002-2011.

[16]GautierH, RocciA, BuretM, et al., 2005. Fruit load or fruit position alters response to temperature and subsequently cherry tomato quality. J Sci Food Agric, 85(6):1009-1016.

[17]HuangYG, WangYH, TaiY, et al., 2020. CurricularFace: adaptive curriculum learning loss for deep face recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, p.5900-5909.

[18]KhanZ, ShenY, LiuH, 2025. ObjectDetection in agriculture: a comprehensive review of methods, applications, challenges, and future directions. Agriculture, 15(13):1351.

[19]KrizhevskyA, SutskeverI, HintonGE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90.

[20]KumarP, BhatKM, Shenvi NadkarniVB, et al., 2024. GLiDR: topologically regularized graph generative network for sparse LiDAR point clouds. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, p.15152-15161.

[21]LiRZ, JiZJ, HuSK, et al., 2023. Tomato maturity recognition model based on improved YOLOv5 in greenhouse. Agronomy, 13(2):603.

[22]LiXX, ChenWB, WangYQ, et al., 2023. Design and experiment of an automatic cherry tomato harvesting system based on cascade vision detection. Trans Chin Soc Agric Eng, 39(1):136-145 (in Chinese).

[23]SampaioGS, SilvaLA, MarengoniM, 2021. 3D reconstruction of non-rigid plants and sensor data fusion for agriculture phenotyping. Sensors, 21(12):4115.

[24]SuM, ZhouD, YunYZ, et al., 2025. Design and implementation of a high-throughput field phenotyping robot for acquiring multisensor data in wheat. Plant Phenomics, 7(2):100014.

[25]VondrickC, KhoslaA, PirsiavashH, et al., 2016. Visualizing object detection features. Int J Comput Vision, 119(2):145-158.

[26]WangAC, QianWH, LiA, et al., 2024. NVW-YOLOv8s: an improved YOLOv8s network for real-time detection and segmentation of tomato fruits at different ripeness stages. Comput Electron Agric, 219:108833.

[27]WangXR, GuoLQ, WangXY, et al., 2025. SoftShadow: leveraging soft masks for penumbra-aware shadow removal. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA, p.23217-23226.

[28]WangZ, LingYM, WangXL, et al., 2022. An improved Faster R-CNN model for multi-object tomato maturity detection in complex scenarios. Ecol Inform, 72:101886.

[29]XiaoF, WangHB, XuYQ, et al., 2023. Fruit detection and recognition based on deep learning for automatic harvesting: an overview and review. Agronomy, 13(6):1625.

[30]YaoJ, KeXB, GXYuet al., 2025. Optimized substrate selection for enhanced orchid growth based on high-throughput lysimetric arrays. J Zhejiang Univ-Sci B, online first.

[31]ZhangXB, HuY, ChenW, et al., 2021. 3D brain glioma segmentation in MRI through integrating multiple densely connected 2D convolutional neural networks. J Zhejiang Univ-Sci B (Biomed & Biotechnol), 22(6):462-475.

Open peer comments: Debate/Discuss/Question/Opinion

<1>