Full Text:   <119>

CLC number: 

On-line Access: 2022-04-19

Received: 2021-07-23

Revision Accepted: 2022-03-23

Crosschecked: 0000-00-00

Cited: 0

Clicked: 81

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.

http://doi.org/10.1631/FITEE.2100360


Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge


Author(s):  Jingfa LIU, Fan LI, Ruoyao DING, Ziang LIU

Affiliation(s):  Guangzhou Key Laboratory of Multilingual Intelligent Processing, Guangdong University of Foreign Studies; more

Corresponding email(s):   jfliu@nuist.edu.cn, bj2014_lifan@163.com

Key Words:  Focused crawler, Ontology, Priority evaluation, Simulated annealing, Rainstorm disaster


Jingfa LIU, Fan LI, Ruoyao DING, Ziang LIU. Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge",
author="Jingfa LIU, Fan LI, Ruoyao DING, Ziang LIU",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100360"
}

%0 Journal Article
%T Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge
%A Jingfa LIU
%A Fan LI
%A Ruoyao DING
%A Ziang LIU
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100360

TY - JOUR
T1 - Focused crawling strategies based on ontologies and simulated annealing method for rainstorm disaster domain knowledge
A1 - Jingfa LIU
A1 - Fan LI
A1 - Ruoyao DING
A1 - Ziang LIU
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100360


Abstract: 
At present, the focused crawler is a crucial method for obtaining effective domain knowledge from massive heterogeneous networks. For most current focused crawling technologies, there are some obstacles in obtaining high quality in the crawling results. The main difficulties are the establishment of topic benchmark models, the assessment of topic relevance of hyperlinks and the design of crawling strategies. In this paper, we use a domain ontology to build a topic benchmark model for a specific topic, and propose a novel multiple-filtering strategy based on local ontology and global ontology (MFSLG). A comprehensive priority evaluation method (CPEM) based on Web text and link structure is introduced to improve the computational precision of topic relevance for unvisited hyperlinks, and a simulated annealing (SA) method is used to avoid the focused crawler falling into local optima of the search. By incorporating the SA into the focused crawler with the MFSLG and the CPEM for the first time, two novel focused crawler strategies based on ontology and SA (FCOSAs), including FCOSA with only global ontology (FCOSA_G) and FCOSA with both local ontology and global ontology (FCOSA_LG) are proposed to obtain topic-relevant webpages about rainstorm disasters from the network. Experimental results show that the proposed crawlers outperform other focused crawling strategies in the literature on different performance metric indices.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE