|
Journal of Zhejiang University SCIENCE C
ISSN 1869-1951(Print), 1869-196x(Online), Monthly
2010 Vol.11 No.11 P.844-849
Importance of retrieving noun phrases and named entities from digital library content
Abstract: We present a novel approach for extracting noun phrases in general and named entities in particular from a digital repository of text documents. The problem of coreference resolution has been divided into two subproblems: pronoun resolution and non-pronominal resolution. A rule based-technique was used for pronoun resolution while a learning approach for non-pronominal resolution. For named entity resolution, disambiguation arises mainly due to polysemy and synonymy. The proposed approach fixes both problems with the help of WordNet and the Word Sense Disambiguation tool. The proposed approach, to our knowledge, outperforms several baseline techniques with a higher balanced F-measure, which is harmonic mean of recall and precision. The improvements in the system performance are due to the filtering of antecedents for the anaphor based on several linguistic disagreements, use of a hybrid approach, and increment in the feature vector to include more linguistic details in the learning technique.
Key words: Coreference resolution, Hybrid approach, Filtering, Rule based and J48 algorithm
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/jzus.C1001003
CLC number:
TP391
Download Full Text:
Downloaded:
2814
Clicked:
7265
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2010-09-01