JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE A

ISSN 1673-565X(Print), 1862-1775(Online), Monthly

2025 Vol.26 No.10 P.895-916

Virtual sample diffusion generation method guided by large language model-generated knowledge for enhancing information completeness and zero-shot fault diagnosis in building thermal systems

Zhe SUN, Qiwei YAO, Ling SHI, Huaqiang JIN, Yingjie XU, Peng YANG, Han XIAO, Dongyu CHEN, Panpan ZHAO, Xi SHEN

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China; College of Information Science and Engineering, Jiaxing University, Jiaxing 314001, China; College of Education, Zhejiang University of Technology, Hangzhou 310023, China; College of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; Hefei General Machinery Research Institute Company Limited, Hefei 230031, China

sx@zjut.edu.cn, Jhq@zjut.edu.cn

Abstract: In the era of big data, data-driven technologies are increasingly leveraged by industry to facilitate autonomous learning and intelligent decision-making. However, the challenge of “small samples in big data” emerges when datasets lack the comprehensive information necessary for addressing complex scenarios, which hampers adaptability. Thus, enhancing data completeness is essential. Knowledge-guided virtual sample generation transforms domain knowledge into extensive virtual datasets, thereby reducing dependence on limited real samples and enabling zero-sample fault diagnosis. This study used building air conditioning systems as a case study. We innovatively used the large language model (LLM) to acquire domain knowledge for sample generation, significantly lowering knowledge acquisition costs and establishing a generalized framework for knowledge acquisition in engineering applications. This acquired knowledge guided the design of diffusion boundaries in mega-trend diffusion (MTD), while the Monte Carlo method was used to sample within the diffusion function to create information-rich virtual samples. Additionally, a noise-adding technique was introduced to enhance the information entropy of these samples, thereby improving the robustness of neural networks trained with them. Experimental results showed that training the diagnostic model exclusively with virtual samples achieved an accuracy of 72.80%, significantly surpassing traditional small-sample supervised learning in terms of generalization. This underscores the quality and completeness of the generated virtual samples.

Key words: Information completeness; Large language models (LLMs); Virtual sample generation; Knowledge-guided; Building air conditioning systems

Chinese Summary <10> 面向信息完备性增强和零样本故障诊断的建筑热力系统知识引导虚拟样本扩散生成方法

作者：孙哲¹，姚琪威¹，石凌¹，金华强³，徐英杰¹，杨鹏¹，肖涵¹，陈栋宇⁴，赵盼盼⁵，沈希^1,2
机构：¹浙江工业大学，机械工程学院，中国杭州，310023；²嘉兴大学，信息科学与工程学院，中国嘉兴，314001；³浙江工业大学，教育学院，中国杭州，310023；⁴上海交通大学，机械工程学院，中国上海，200240；⁵合肥通用机械研究院，中国合肥，230031
目的：工业界越来越多地采用数据驱动技术以推动自主学习和智能决策，却受到"大数据中的小样本"这一问题的挑战。本文提出将大模型生成的领域知识转化为虚拟数据集，以期显著降低对有限真实样本的依赖，实现零样本故障诊断能力。
创新点：1.创新性地采用大语言模型获取领域知识用于样本生成，大幅降低了知识获取成本；2.通过将获取的知识引导至Mega-trend扩散（MTD）的边界设计，同时结合蒙特卡洛方法对扩散函数进行采样，最终生成信息丰富的虚拟样本；3.引入噪声注入技术有效提升了样本的信息熵，从而增强基于此类样本训练的神经网络鲁棒性。
方法：1.通过提示工程增加大语言模型（LLM）获取知识的准确性；2.通过改进的MTD生成虚拟样本，并引入噪声提升鲁棒性；3.利用虚拟样本训练神经网络得到故障诊断模型。
结论：实验结果表明，仅使用虚拟样本训练的故障诊断模型准确率达到72.80%，其泛化能力显著超越传统小样本监督学习方法，有力验证生成虚拟样本的质量与信息完备性。

关键词组：信息完备性；大语言模型；虚拟样本生成；知识引导；建筑空调系统

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/jzus.A2400560

CLC number:

Download Full Text:

Click Here

Downloaded:

1430

Download summary:

Downloaded:

120

Clicked:

1982

Cited:

On-line Access:

2025-10-25

Received:

2024-12-05

Revision Accepted:

2025-03-17

Crosschecked:

2025-10-27

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service