Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE A

ISSN 1673-565X(Print), 1862-1775(Online), Monthly

Virtual sample diffusion generation method guided by large language model-generated knowledge for enhancing information completeness and zero-shot fault diagnosis in building thermal systems

Abstract: In the era of big data, data-driven technologies are increasingly leveraged by industry to facilitate autonomous learning and intelligent decision-making. However, the challenge of “small samples in big data” emerges when datasets lack the comprehensive information necessary for addressing complex scenarios, which hampers adaptability. Thus, enhancing data completeness is essential. Knowledge-guided virtual sample generation transforms domain knowledge into extensive virtual datasets, thereby reducing dependence on limited real samples and enabling zero-sample fault diagnosis. This study used building air conditioning systems as a case study. We innovatively used the large language model (LLM) to acquire domain knowledge for sample generation, significantly lowering knowledge acquisition costs and establishing a generalized framework for knowledge acquisition in engineering applications. This acquired knowledge guided the design of diffusion boundaries in mega-trend diffusion (MTD), while the Monte Carlo method was used to sample within the diffusion function to create information-rich virtual samples. Additionally, a noise-adding technique was introduced to enhance the information entropy of these samples, thereby improving the robustness of neural networks trained with them. Experimental results showed that training the diagnostic model exclusively with virtual samples achieved an accuracy of 72.80%, significantly surpassing traditional small-sample supervised learning in terms of generalization. This underscores the quality and completeness of the generated virtual samples.

Key words: Information completeness; Large language models (LLMs); Virtual sample generation; Knowledge-guided; Building air conditioning systems

Chinese Summary  <10> 面向信息完备性增强和零样本故障诊断的建筑热力系统知识引导虚拟样本扩散生成方法

作者:孙哲1,姚琪威1,石凌1,金华强3,徐英杰1,杨鹏1,肖涵1,陈栋宇4,赵盼盼5,沈希1,2
机构:1浙江工业大学,机械工程学院,中国杭州,310023;2嘉兴大学,信息科学与工程学院,中国嘉兴,314001;3浙江工业大学,教育学院,中国杭州,310023;4上海交通大学,机械工程学院,中国上海,200240;5合肥通用机械研究院,中国合肥,230031
目的:工业界越来越多地采用数据驱动技术以推动自主学习和智能决策,却受到"大数据中的小样本"这一问题的挑战。本文提出将大模型生成的领域知识转化为虚拟数据集,以期显著降低对有限真实样本的依赖,实现零样本故障诊断能力。
创新点:1.创新性地采用大语言模型获取领域知识用于样本生成,大幅降低了知识获取成本;2.通过将获取的知识引导至Mega-trend扩散(MTD)的边界设计,同时结合蒙特卡洛方法对扩散函数进行采样,最终生成信息丰富的虚拟样本;3.引入噪声注入技术有效提升了样本的信息熵,从而增强基于此类样本训练的神经网络鲁棒性。
方法:1.通过提示工程增加大语言模型(LLM)获取知识的准确性;2.通过改进的MTD生成虚拟样本,并引入噪声提升鲁棒性;3.利用虚拟样本训练神经网络得到故障诊断模型。
结论:实验结果表明,仅使用虚拟样本训练的故障诊断模型准确率达到72.80%,其泛化能力显著超越传统小样本监督学习方法,有力验证生成虚拟样本的质量与信息完备性。

关键词组:信息完备性;大语言模型;虚拟样本生成;知识引导;建筑空调系统


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.A2400560

CLC number:

Download Full Text:

Click Here

Downloaded:

1430

Download summary:

<Click Here> 

Downloaded:

120

Clicked:

1982

Cited:

0

On-line Access:

2025-10-25

Received:

2024-12-05

Revision Accepted:

2025-03-17

Crosschecked:

2025-10-27

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE