CLC number: TP39
On-line Access: 2022-08-22
Received: 2021-07-23
Revision Accepted: 2022-03-23
Crosschecked: 2022-08-29
Cited: 0
Clicked: 1632
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0002-0407-1522
https://orcid.org/0000-0001-7836-0522
Jingfa LIU, Fan LI, Ruoyao DING, Zi’ang LIU. Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100360 @article{title="Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge", %0 Journal Article TY - JOUR
基于本体和模拟退火算法的暴雨灾害主题爬虫策略1广东外语外贸大学广州市非通用语种智能处理重点实验室,中国广州市,510006 2广东外语外贸大学信息科学与技术学院,中国广州市,510006 3南京信息工程大学计算机与软件学院,中国南京市,210044 4阿尔伯塔大学理学院,加拿大埃德蒙顿市,T6G2H6 摘要:目前,主题爬虫是从海量异构网络中获取有效领域知识的重要方法。目前大多数主题爬虫技术难以获得高质量爬行结果。主要难点包括主题基准模型的建立、超链接主题相关性的评估和爬行策略的设计。本文采用领域本体为特定主题构建主题基准模型,并提出一种新的基于局部本体和全局本体的多重筛选策略(MFSLG)。为提高待访问超链接主题相关性计算精度,提出一种基于网页文本和链接结构的综合优先度评估方法(CPEM),同时,采用模拟退火(SA)算法避免主题爬虫陷入局部最优搜索。本文首次设计融合SA算法、MFSLG策略和CPEM策略实现主题爬虫,提出两种新的基于本体和SA主题爬虫策略(FCOSA),包括基于全局本体的FCOSA策略(FCOSA_G)和基于局部本体和全局本体的FCOSA策略(FCOSA_LG),以从网络中获取与暴雨灾害主题相关的网页。实验结果表明,针对不同性能指标,所提爬虫策略优于其他主题爬虫策略。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Bajpai N, Arora D, 2018. Domain-based search engine evaluation. In: Saeed K, Chaki N, Pati B, et al. (Eds.), Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, volume 564. Springer, Singapore, p.711-720. [2]Boukadi K, Rekik M, Rekik M, et al., 2018. FC4CD: a new SOA-based focused crawler for cloud service discovery. Computing, 100(10):1081-1107. [3]Capuano A, Rinaldi AM, Russo C, 2020. An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques. Multim Tools Appl, 79(11):7577-7598. [4]Chen YB, Zhang Z, Zhang T, 2011. A searching strategy in topic crawler using ant colony algorithm. Microcomput Appl, 30(1):53-56 (in Chinese). [5]Cheng YK, Liao WJ, Cheng G, 2018. Strategy of focused crawler with word embedding clustering weighted in shark-search algorithm. Comput Dig Eng, 46(1):144-148 (in Chinese). [6]Colazzo D, Ghelli G, Pardini L, et al., 2013. Almost-linear inclusion for XML regular expression types. ACM Trans Database Syst, 38(3):15. [7]Derrac J, García S, Molina D, et al., 2011. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput, 1(1):3-18. [8]Dong Y, Liu JF, Liu WJ, 2020. Focused crawler strategy based on multi-objective ant colony algorithm. Comput Eng, 46(9):274-282 (in Chinese). [9]Du YJ, Pen QQ, Gao ZQ, 2013. A topic-specific crawling strategy based on semantics similarity. Data Knowl Eng, 88:75-93. [10]Du YJ, Hai YF, Xie CZ, et al., 2014. An approach for selecting seed URLs of focused crawler based on user-interest ontology. Appl Soft Comput, 14:663-676. [11]Du YJ, Li CX, Hu Q, et al., 2017. Ranking webpages using a path trust knowledge graph. Neurocomputing, 269:58-72. [12]Farag MMG, Lee S, Fox EA, 2018. Focused crawler for events. Int J Dig Libr, 19(1):3-19. [13]Gruber TR, 1995. Toward principles for the design of ontologies used for knowledge sharing? Int J Human-Comput Stud, 43(5-6):907-928. [14]Guan WG, Luo YC, 2016. Design and implementation of focused crawler based on concept context graph. Comput Eng Des, 37(10):2679-2684(in Chinese). [15]He S, Cheng JX, Cai XB, 2009. Focused crawler based on simulated anneal algorithm. Comput Technol Dev, 19(12):55-58, 62 (in Chinese). [16]Jia JF, Tumanian V, Li GQ, 2021. Discovering semantically related technical terms and web resources in Q&A discussions. Front Inform Technol Electron Eng, 22(7):969-985. [17]Jing WP, Wang YJ, Dong WW, 2016. Research on adaptive genetic algorithm in application of focused crawler search strategy. Comput Sci, 43(8):254-257 (in Chinese). [18]Khadir AC, Aliane H, Guessoum A, 2021. Ontology learning: grand tour and challenges. Comput Sci Rev, 39:100339. [19]Lakzaei B, Shamsfard M, 2021. Ontology learning from relational databases. Inform Sci, 577:280-297. [20]Liu B, Jiang SY, Zou Q, 2020. HITS-PR-HHblits: protein remote homology detection by combining PageRank and hyperlink-induced topic search. Brief Bioinform, 21(1):298-308. [21]Liu JF, Li G, Chen DB, et al, 2010. Two-dimensional equilibrium constraint layout using simulated annealing. Comput Ind Eng, 59(4):530-536. [22]Liu JF, Li F, Jiang SY, 2019a. Focused annealing crawler algorithm for rainstorm disasters based on comprehensive priority and host information. Comput Sci, 46(2):215-222 (in Chinese). [23]Liu JF, Li X, Jiang SY, 2019b. Focused crawler for rainstorm disaster strategy based on web space evolutionary algorithm. Comput Eng, 45(2):184-190 (in Chinese). [24]Liu JF, Gu YP, Liu WJ, 2020. Focused crawler method combining ontology and improved Tabu search for meteorological disaster. J Comput Appl, 40(8):2255-2261 (in Chinese). [25]Liu WJ, Du YJ, 2014. A novel focused crawler based on cell-like membrane computing optimization algorithm. Neurocomputing, 123:266-280. [26]Patel A, Schmidt N, 2011. Application of structured document parsing to focused web crawling. Comput Stand Inter, 33(3):325-331. [27]Prakash J, Kumar R, 2015. Web crawling through shark-search using PageRank. Proc Comput Sci, 48:210-216. [28]Rawat S, Patil DR, 2013. Efficient focused crawling based on best first search. Proc 3rd IEEE Int Advance Computing Conf, p.908-911. [29]Rios-Alvarado AB, Lopez-Arevalo I, Sosa-Sosa VJ, 2013. Learning concept hierarchies from textual resources for ontologies construction. Expert Syst Appl, 40(15):5907-5915. [30]Tong YL, 2008. Application of focused crawler using adaptive dynamical evolutional particle swarm optimization. Geom Inform Sci Wuhan Univ, 33(12):1296-1299 (in Chinese). [31]Tsikrika T, Moumtzidou A, Vrochidis S, et al., 2016. Focussed crawling of environmental web resources based on the combination of multimedia evidence. Multim Tools Appl, 75(3):1563-1587. [32]Vidal MLA, da Silva AS, de Moura ES, et al., 2006. Structure-driven crawler generation by example. Proc 29th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.292-299. [33]Wang ZG, Meng BJ, 2014. A comparison of approaches to Chinese word segmentation in Hadoop. Proc IEEE Int Conf on Data Mining Workshop, p.844-850. [34]Yang YK, Du YJ, Sun JY, et al., 2008. A topic-specific web crawler with concept similarity context graph based on FCA. Proc 4th Int Conf on Intelligent Computing, p.840-847. [35]Zhu G, Yang JY, Wu XH, et al., 2017. Research on construction of hierarchy relationship and ontology of meteorological disaster based on FCA. Mod Inform, 37(5):79-88 (in Chinese). Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>