Full Text:   <167>

CLC number: 

On-line Access: 2023-02-02

Received: 2022-07-22

Revision Accepted: 2023-01-06

Crosschecked: 0000-00-00

Cited: 0

Clicked: 116

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.

http://doi.org/10.1631/FITEE.2200315


A new focused crawler using an improved tabu search algorithm incorporating ontology and host information


Author(s):  Jingfa LIU, Zhen WANG, Guo ZHONG, Zhihe YANG

Affiliation(s):  School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou 510006, China; more

Corresponding email(s):   1007427607@qq.com

Key Words:  Focused crawler, Tabu search algorithm, Ontology, Host information, Priority evaluation


Jingfa LIU, Zhen WANG, Guo ZHONG, Zhihe YANG. A new focused crawler using an improved tabu search algorithm incorporating ontology and host information[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="A new focused crawler using an improved tabu search algorithm incorporating ontology and host information",
author="Jingfa LIU, Zhen WANG, Guo ZHONG, Zhihe YANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200315"
}

%0 Journal Article
%T A new focused crawler using an improved tabu search algorithm incorporating ontology and host information
%A Jingfa LIU
%A Zhen WANG
%A Guo ZHONG
%A Zhihe YANG
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200315

TY - JOUR
T1 - A new focused crawler using an improved tabu search algorithm incorporating ontology and host information
A1 - Jingfa LIU
A1 - Zhen WANG
A1 - Guo ZHONG
A1 - Zhihe YANG
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200315


Abstract: 
To solve the problems of incomplete topic description and repetitive crawling of visited hyperlinks in traditional focused crawling methods, in this paper, we propose a novel focused crawler using an improved tabu search algorithm with domain ontology and host information (FCITS_OH), where a domain ontology is constructed by formal concept analysis to describe topics at the semantic and knowledge levels. To avoid crawling visited hyperlinks and expand the search range, we present an improved tabu search (ITS) algorithm and the strategy of host information memory. In addition, a comprehensive priority evaluation method based on Web text and link structure is designed to improve the assessment of topic relevance for unvisited hyperlinks. The experimental results on both tourism and rainstorm disaster domains show that the proposed focused crawlers overmatch the traditional focused crawlers for different performance metric indices.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2023 Journal of Zhejiang University-SCIENCE