Journal of Zhejiang University SCIENCE C 2010 Vol.11 No.11 P.844-849


Importance of retrieving noun phrases and named entities from digital library content

Author(s):  Ratna Sanyal, Kushal Keshri, Vidya Nand

Affiliation(s):  Indian Institute of Information Technology, Allahabad 211012, India

Corresponding email(s):   rsanyal@iiita.ac.in, iit2006031@iiita.ac.in, iit2006032@iiita.ac.in

Key Words:  Coreference resolution, Hybrid approach, Filtering, Rule based and J48 algorithm

We present a novel approach for extracting noun phrases in general and named entities in particular from a digital repository of text documents. The problem of coreference resolution has been divided into two subproblems: pronoun resolution and non-pronominal resolution. A rule based-technique was used for pronoun resolution while a learning approach for non-pronominal resolution. For named entity resolution, disambiguation arises mainly due to polysemy and synonymy. The proposed approach fixes both problems with the help of WordNet and the Word Sense Disambiguation tool. The proposed approach, to our knowledge, outperforms several baseline techniques with a higher balanced F-measure, which is harmonic mean of recall and precision. The improvements in the system performance are due to the filtering of antecedents for the anaphor based on several linguistic disagreements, use of a hybrid approach, and increment in the feature vector to include more linguistic details in the learning technique.

