Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE C

ISSN 1869-1951(Print), 1869-196x(Online), Monthly

Importance of retrieving noun phrases and named entities from digital library content

Abstract: We present a novel approach for extracting noun phrases in general and named entities in particular from a digital repository of text documents. The problem of coreference resolution has been divided into two subproblems: pronoun resolution and non-pronominal resolution. A rule based-technique was used for pronoun resolution while a learning approach for non-pronominal resolution. For named entity resolution, disambiguation arises mainly due to polysemy and synonymy. The proposed approach fixes both problems with the help of WordNet and the Word Sense Disambiguation tool. The proposed approach, to our knowledge, outperforms several baseline techniques with a higher balanced F-measure, which is harmonic mean of recall and precision. The improvements in the system performance are due to the filtering of antecedents for the anaphor based on several linguistic disagreements, use of a hybrid approach, and increment in the feature vector to include more linguistic details in the learning technique.

Key words: Coreference resolution, Hybrid approach, Filtering, Rule based and J48 algorithm


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.C1001003

CLC number:

TP391

Download Full Text:

Click Here

Downloaded:

2592

Clicked:

6625

Cited:

0

On-line Access:

2010-11-04

Received:

2010-09-01

Revision Accepted:

2010-09-16

Crosschecked:

2010-09-01

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE