
Bingquan CHU, Ruiyuan WU, Haijun ZHANG, Haochuan QIN, Zishun PENG, Fengle ZHU, Yong HE. Embedding of ripening topology into one-stage detection for tomato cluster phenotyping[J]. Journal of Zhejiang University Science B, 2026, 27(5): 466-481.
@article{title="Embedding of ripening topology into one-stage detection for tomato cluster phenotyping",
author="Bingquan CHU, Ruiyuan WU, Haijun ZHANG, Haochuan QIN, Zishun PENG, Fengle ZHU, Yong HE",
journal="Journal of Zhejiang University Science B",
volume="27",
number="5",
pages="466-481",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.B2500647"
}
%0 Journal Article
%T Embedding of ripening topology into one-stage detection for tomato cluster phenotyping
%A Bingquan CHU
%A Ruiyuan WU
%A Haijun ZHANG
%A Haochuan QIN
%A Zishun PENG
%A Fengle ZHU
%A Yong HE
%J Journal of Zhejiang University SCIENCE B
%V 27
%N 5
%P 466-481
%@ 1673-1581
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.B2500647
TY - JOUR
T1 - Embedding of ripening topology into one-stage detection for tomato cluster phenotyping
A1 - Bingquan CHU
A1 - Ruiyuan WU
A1 - Haijun ZHANG
A1 - Haochuan QIN
A1 - Zishun PENG
A1 - Fengle ZHU
A1 - Yong HE
J0 - Journal of Zhejiang University Science B
VL - 27
IS - 5
SP - 466
EP - 481
%@ 1673-1581
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.B2500647
Abstract: The automated assessment of tomato ripeness is vital for modern greenhouse operations, yet challenges remain due to variable environmental conditions. To provide a solution, we propose rank-aware you Only Look Once (YOLO), a novel detection framework that incorporates the biological prior of top-to-bottom ripening within fruit clusters. This is achieved through two key innovations: an efficient position-aware head for regressing relative height for fruits and a dynamic margin-aware ranking loss (DM-RankLoss) that enforces the correct spatial sequence. Evaluated on a 3500-image dataset from a solar greenhouse, our plug-and-play module could boost the mean average precision (mAP) at intersection over union (IoU) threshold of 0.50 (mAP50) of multiple YOLO architectures by up to 5.66 pecentage points. The model effectively learns the cluster topology, achieving a height-mean absolute error (H-MAE) of 0.107 (normalized) and a pairwise ranking accuracy (PRA) of 84.59%, while it reduces the parameter count by over 10% compared to the baseline for efficient deployment. Visualizations confirm that the model leverages spatial context to resolve color ambiguities. Our work offers a sensor-free, accurate, and efficient solution for in situ phenotyping in agricultural robotics.
[1]AlbaharM, 2023. A survey on deep learning and its impact on agriculture: challenges and opportunities. Agriculture, 13(3):540.
[2]All China Federation of Supply and Marketing Cooperatives, 2021. Tomato, GH/T 1193-2021. All China Federation of Supply and Marketing Cooperatives, China.
[3]BanerjeeS, MageeL, WangDK, et al., 2020. Semantic segmentation of microscopic neuroanatomical data by combining topological priors with encoder‒decoder deep networks. Nat Mach Intell, 2(10):585-594.
[4]BurgesC, ShakedT, RenshawE, et al., 2005. Learning to rank using gradient descent. Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany, p.89-96.
[5]BurgesCJC, 2010. From RankNet to LambdaRank to LambdaMART: an overview. Microsoft Research Technical Report, MSR-TR-2010-82. Available from: https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview
[6]CaoWZ, MirjaliliV, RaschkaS, 2020. Rank consistent ordinal regression for neural networks with application to age estimation. Pattern Recogn Lett, 140:325-331.
[7]ChenWB, LiuMC, ZhaoCJ, et al., 2024. MTD-YOLO: multi-task deep convolutional neural network for cherry tomato fruit bunch maturity detection. Comput Electron Agric, 216:108533.
[8]ChenWJ, RaoY, WangFY, et al., 2024. MLP-based multimodal tomato detection in complex scenarios: insights from task-specific analysis of feature fusion architectures. Comput Electron Agric, 221:108951.
[9]ChuBQ, GuoZY, LiuBJ, et al., 2025. Fast detection of rice striped stem borer (Chilo suppressalis) stress based on UAV sensor and multimodal segmentation method. Plant Growth Regul, 105(4):1057-1071.
[10]DaoT, FuDY, ErmonS, et al., 2022. FlashAttention: fast and memory-efficient exact attention with IO-awareness. arXiv:2205.14135.
[11]DengJK, GuoJ, XueNN, et al., 2019. ArcFace: additive angular margin loss for deep face recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, p.4685-4694.
[12]DíazR, MaratheA, 2019. Soft labels for ordinal regression. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, p.4733-4742.
[13]FassE, ShlomiE, ZivC, et al., 2025. Machine learning models based on hyperspectral imaging for pre-harvest tomato fruit quality monitoring. Comput Electron Agric, 229:109788.
[14]Food and Agriculture Organization of the United Nations (FAO), 2023. Crops and livestock products. https://www.fao.org/faostat/en/#data/QCL [Accessed on Oct. 1, 2025].
[15]FuH, GongMM, WangCH, et al., 2018. Deep ordinal regression network for monocular depth estimation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, p.2002-2011.
[16]GautierH, RocciA, BuretM, et al., 2005. Fruit load or fruit position alters response to temperature and subsequently cherry tomato quality. J Sci Food Agric, 85(6):1009-1016.
[17]HuangYG, WangYH, TaiY, et al., 2020. CurricularFace: adaptive curriculum learning loss for deep face recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, p.5900-5909.
[18]KhanZ, ShenY, LiuH, 2025. ObjectDetection in agriculture: a comprehensive review of methods, applications, challenges, and future directions. Agriculture, 15(13):1351.
[19]KrizhevskyA, SutskeverI, HintonGE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90.
[20]KumarP, BhatKM, Shenvi NadkarniVB, et al., 2024. GLiDR: topologically regularized graph generative network for sparse LiDAR point clouds. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, p.15152-15161.
[21]LiRZ, JiZJ, HuSK, et al., 2023. Tomato maturity recognition model based on improved YOLOv5 in greenhouse. Agronomy, 13(2):603.
[22]LiXX, ChenWB, WangYQ, et al., 2023. Design and experiment of an automatic cherry tomato harvesting system based on cascade vision detection. Trans Chin Soc Agric Eng, 39(1):136-145 (in Chinese).
[23]SampaioGS, SilvaLA, MarengoniM, 2021. 3D reconstruction of non-rigid plants and sensor data fusion for agriculture phenotyping. Sensors, 21(12):4115.
[24]SuM, ZhouD, YunYZ, et al., 2025. Design and implementation of a high-throughput field phenotyping robot for acquiring multisensor data in wheat. Plant Phenomics, 7(2):100014.
[25]VondrickC, KhoslaA, PirsiavashH, et al., 2016. Visualizing object detection features. Int J Comput Vision, 119(2):145-158.
[26]WangAC, QianWH, LiA, et al., 2024. NVW-YOLOv8s: an improved YOLOv8s network for real-time detection and segmentation of tomato fruits at different ripeness stages. Comput Electron Agric, 219:108833.
[27]WangXR, GuoLQ, WangXY, et al., 2025. SoftShadow: leveraging soft masks for penumbra-aware shadow removal. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA, p.23217-23226.
[28]WangZ, LingYM, WangXL, et al., 2022. An improved Faster R-CNN model for multi-object tomato maturity detection in complex scenarios. Ecol Inform, 72:101886.
[29]XiaoF, WangHB, XuYQ, et al., 2023. Fruit detection and recognition based on deep learning for automatic harvesting: an overview and review. Agronomy, 13(6):1625.
[30]YaoJ, KeXB, GXYuet al., 2025. Optimized substrate selection for enhanced orchid growth based on high-throughput lysimetric arrays. J Zhejiang Univ-Sci B, online first.
[31]ZhangXB, HuY, ChenW, et al., 2021. 3D brain glioma segmentation in MRI through integrating multiple densely connected 2D convolutional neural networks. J Zhejiang Univ-Sci B (Biomed & Biotechnol), 22(6):462-475.
CLC number:
On-line Access: 2026-05-15
Received: 2025-10-14
Revision Accepted: 2026-03-12
Crosschecked: 2026-05-15
Cited: 0
Clicked: 696
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0009-0009-2319-4454
Open peer comments: Debate/Discuss/Question/Opinion
<1>