Improving the Retrieval Performance by Using the Distance-Based Bigrams

Main Article Content

Pakinee Aimmanee
Thanaruk Theeramunkong

Abstract

In this paper, we discuss a new way to improve retrieval performance using a special type of bigrams called Distance-based Bigram (DB). DB is a word pair whose distance between the two components is greater than or equal to one. DB allows us to find documents that express a phrase or a sentence differently from a query. The results show that DB combined with unigram performs significantly better than the unigram and bigram when the first few correct documents are needed.

Article Details

How to Cite
Aimmanee, P., & Theeramunkong, T. (2009). Improving the Retrieval Performance by Using the Distance-Based Bigrams. ECTI Transactions on Electrical Engineering, Electronics, and Communications, 8(1), 106–112. https://doi.org/10.37936/ecti-eec.201081.172043
Section
Research Article

References

[1] Thomas K. Landauder, P.W.F., Laham, D.: Introduction to latent semantic analysis. Discourses Processes 25 (1998) 259–284

[2] Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37 (1995) 573–595

[3] Pirkola, A., Keskustalo, H., Leppanen, E., Kansala, A.P., Jarvelin, K.: Targeted s-gram matching: a novel n-gram matching technique for crossand mono-lingual form variants. Information
Process Management 4 (2002) 231–255

[4] Burkhardt, S., Karkkainen, J.: Better filtering with a gapped q-grams. Fundamenta Informaticae 56(1/2) (2003) 51–70

[5] Ian H. Witten, Alistair Moffat, T.C.B.: Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold (1994)

[6] Michael Berry, Zhang Drmac, E.R.J.: Matrices, vector spaces, and information retrieval. SIAM Review 41 (1999) 335–862