CLC number:
On-line Access: 2022-04-19
Received: 2021-07-23
Revision Accepted: 2022-03-23
Crosschecked: 0000-00-00
Cited: 0
Clicked: 81
Jingfa LIU, Fan LI, Ruoyao DING, Ziang LIU. Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .
@article{title="Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge",
author="Jingfa LIU, Fan LI, Ruoyao DING, Ziang LIU",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100360"
}
%0 Journal Article
%T Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge
%A Jingfa LIU
%A Fan LI
%A Ruoyao DING
%A Ziang LIU
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100360
TY - JOUR
T1 - Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge
A1 - Jingfa LIU
A1 - Fan LI
A1 - Ruoyao DING
A1 - Ziang LIU
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100360
Abstract: At present, the focused crawler is a crucial method for obtaining effective domain knowledge from massive heterogeneous networks. For most current focused crawling technologies, there are some obstacles in obtaining high quality in the crawling results. The main difficulties are the establishment of topic benchmark models, the assessment of topic relevance of hyperlinks and the design of crawling strategies. In this paper, we use a domain ontology to build a topic benchmark model for a specific topic, and propose a novel multiple-filtering strategy based on local ontology and global ontology (MFSLG). A comprehensive priority evaluation method (CPEM) based on Web text and link structure is introduced to improve the computational precision of topic relevance for unvisited hyperlinks, and a simulated annealing (SA) method is used to avoid the focused crawler falling into local optima of the search. By incorporating the SA into the focused crawler with the MFSLG and the CPEM for the first time, two novel focused crawler strategies based on ontology and SA (FCOSAs), including FCOSA with only global ontology (FCOSA_G) and FCOSA with both local ontology and global ontology (FCOSA_LG) are proposed to obtain topic-relevant webpages about rainstorm disasters from the network. Experimental results show that the proposed crawlers outperform other focused crawling strategies in the literature on different performance metric indices.
Open peer comments: Debate/Discuss/Question/Opinion
<1>