Full Text:   <2988>

CLC number: TP391

On-line Access: 

Received: 2006-11-10

Revision Accepted: 2007-03-06

Crosschecked: 0000-00-00

Cited: 2

Clicked: 4739

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE A 2007 Vol.8 No.6 P.871-882

http://doi.org/10.1631/jzus.2007.A0871


A novel dependency language model for information retrieval


Author(s):  CAI Ke-ke, BU Jia-jun, CHEN Chun, QIU Guang

Affiliation(s):  School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

Corresponding email(s):   caikeke@zju.edu.cn, bjj@zju.edu.cn

Key Words:  Term dependency, Language modeling (LM), Retrieval model, Sentence retrieval


CAI Ke-ke, BU Jia-jun, CHEN Chun, QIU Guang. A novel dependency language model for information retrieval[J]. Journal of Zhejiang University Science A, 2007, 8(6): 871-882.

@article{title="A novel dependency language model for information retrieval",
author="CAI Ke-ke, BU Jia-jun, CHEN Chun, QIU Guang",
journal="Journal of Zhejiang University Science A",
volume="8",
number="6",
pages="871-882",
year="2007",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.2007.A0871"
}

%0 Journal Article
%T A novel dependency language model for information retrieval
%A CAI Ke-ke
%A BU Jia-jun
%A CHEN Chun
%A QIU Guang
%J Journal of Zhejiang University SCIENCE A
%V 8
%N 6
%P 871-882
%@ 1673-565X
%D 2007
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2007.A0871

TY - JOUR
T1 - A novel dependency language model for information retrieval
A1 - CAI Ke-ke
A1 - BU Jia-jun
A1 - CHEN Chun
A1 - QIU Guang
J0 - Journal of Zhejiang University Science A
VL - 8
IS - 6
SP - 871
EP - 882
%@ 1673-565X
Y1 - 2007
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2007.A0871


Abstract: 
This paper explores the application of term dependency in information retrieval (IR) and proposes a novel dependency retrieval model. This retrieval model suggests an extension to the existing language modeling (LM) approach to IR by introducing dependency models for both query and document. Relevance between document and query is then evaluated by reference to the Kullback-Leibler divergence between their dependency models. This paper introduces a novel hybrid dependency structure, which allows integration of various forms of dependency within a single framework. A pseudo relevance feedback based method is also introduced for constructing query dependency model. The basic idea is to use query-relevant top-ranking sentences extracted from the top documents at retrieval time as the augmented representation of query, from which the relationships between query terms are identified. A Markov Random Field (MRF) based approach is presented to ensure the relevance of the extracted sentences, which utilizes the association features between query terms within a sentence to evaluate the relevance of each sentence. This dependency retrieval model was compared with other traditional retrieval models. Experiments indicated that it produces significant improvements in retrieval effectiveness.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1] Alvarez, C., Langlais, P., Nie, J., 2004. Word Pairs in Language Modeling for Information Retrieval. Proc. 7th International Conference on Computer Assisted Information Retrieval. Avignon, France, p.686-705.

[2] Buckley, C., Salton, G., Allan, J., Singhal, A., 1995. Automatic Query Expansion Using SMART: TREC-3. Proc. 3rd Text Retrieval Conference. Maryland, USA, p.65-80.

[3] Cao, G., Nie, J., Bai, J., 2005. Integrating Word Relationships into Language Models. Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brazil, p.298-305.

[4] Croft, W.B., Turtle, H.R., Lewis, D.D., 1991. The Use of Phrases and Structured Queries in Information Retrieval. Proc. 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Chicago, USA, p.32-45.

[5] Dillon, M., Gray, A.S., 1983. FASIT: a fully automatic syntactically based indexing system. J. Am. Soc. Inf. Sci., 34(2):99-108.

[6] Dobrushin, P.L., 1968. The description of a random field by means of conditional probabilities and conditions of its regularity. Theory of Probability and Its Applications, 13(2):197-224.

[7] Fagan, J.L, 1987. Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-syntactic Methods. Proc. 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Louisiana, USA, p.91-101.

[8] Gao, J., Nie, J., Wu, G., Cao, G., 2004. Dependence Language Model for Information Retrieval. Proc. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK, p.170-177.

[9] Gauvain, J.L., Lee, C.H., 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. on Speech and Audio Processing, 2(2):291-298.

[10] Hays, D.G., 1964. Dependency theory: a formalism and some observations. Language, 40(4):511-525.

[11] Katz, S.M., 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. on Acoustics, Speech and Signal Processing, 35(3):400-401.

[12] Lafferty, J., Zhai, C., 2001. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Louisiana, USA, p.111-119.

[13] Lee, C., Lee, G., Jang, M., 2006. Dependency structure language model for information retrieval. ETRI, 28(3):337-346.

[14] Lin, D., 1994. Principar—An Efficient, Broad-coverage, Principle-based Parser. Proc. 15th International Conference on Computational Linguistics. Kyoto, Japan, p.482-488.

[15] Lo, A.W., 1988. Maximum likelihood estimation of generalized Ito processes with discretely sampled data. Econ. Theory, 4:231-247.

[16] Losee, R.M.Jr, 1994. Term dependence: truncating the Bahadur Lazarsfeld expansion. Inf. Process. Manage., 30(2):293-303.

[17] Metzler, D., Croft, W.B., 2005. A Markov Random Field Model for Term Dependencies. Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brazil, p.472-479.

[18] Nallapati, R., Allan, J., 2002. Capturing Term Dependencies Using a Language Model Based on Sentence Trees. Proc. 11th ACM CIKM International Conference on Information and Knowledge Management. Virginia, USA, p.383-390.

[19] Nallapati, R., Allan, J., 2003. An Adaptive Local Dependency Language Model: Relaxing the Naive Bayes’ Assumption. Proc. Workshop on Mathematical and Formal Models in Information Retrival, the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada.

[20] Ponte, J.M., Croft, W.B., 1998. A Language Modeling Approach to Information Retrieval. Proc. 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, p.275-281.

[21] Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M., 1995. Okapi at TREC-3. Proc. 3rd Text Retrieval Conference. Maryland, USA, p.109-216.

[22] Smeaton, A.F., van Rijsbergen, C.J., 1988. Experiments on Incorporating Syntactic Processing of User Queries into a Document Retrieval Strategy. Proc. 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Grenoble, France, p.31-51.

[23] Song, F., Croft, W.B., 1999. A General Language Model for Information Retrieval. Proc. 8th International Conference on Information and Knowledge Management. Missouri, USA, p.316-321.

[24] Spark Jones, K., Walker, S., Robertson, S.E., 1998. A Probabilistic Model of Information Retrieval: Development and Status. Technical Report 446, University of Cambridge Computer Laboratory.

[25] Srikanth, M., Srihari, R., 2002. Biterm Language Models for Document Retrieval. Proc. 25th Annual International ACM SIGIR Conference on Research and Development in Information. Tampere, Finland, p.425-426.

[26] Srikanth, M., Srihari, R., 2003. Exploiting Syntactic Structure of Queries in a Language Modeling Approach to IR. Proc. 12th International Conference on Information and Knowledge Management. LA, USA, p.476-483.

[27] van Rijsbergen, C.J., 1977. A theoretical basis for the use of co-occurrence data in information retrieval. J. Document., 33(2):106-119.

[28] van Rijsbergen, C.J., 1979. Information Retrieval. Butterworths, London.

[29] Zhai, C., Lafferty, J., 2001a. Model-based Feedback in the Language Modeling Approach to Information Retrieval. Proc. 10th ACM CIKM International Conference on Information and Knowledge Management. Atlanta, Georgia, USA, p.403-410.

[30] Zhai, C., Lafferty, J., 2001b. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Louisiana, USA, p.334-342.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE