Improving the Retrieval Performance by Using the Distance-Based Bigrams
Main Article Content
Abstract
In this paper, we discuss a new way to improve retrieval performance using a special type of bigrams called Distance-based Bigram (DB). DB is a word pair whose distance between the two components is greater than or equal to one. DB allows us to find documents that express a phrase or a sentence differently from a query. The results show that DB combined with unigram performs significantly better than the unigram and bigram when the first few correct documents are needed.
Article Details
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.
- Creative Commons Copyright License
The journal allows readers to download and share all published articles as long as they properly cite such articles; however, they cannot change them or use them commercially. This is classified as CC BY-NC-ND for the creative commons license.
- Retention of Copyright and Publishing Rights
The journal allows the authors of the published articles to hold copyrights and publishing rights without restrictions.
References
[2] Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37 (1995) 573–595
[3] Pirkola, A., Keskustalo, H., Leppanen, E., Kansala, A.P., Jarvelin, K.: Targeted s-gram matching: a novel n-gram matching technique for crossand mono-lingual form variants. Information
Process Management 4 (2002) 231–255
[4] Burkhardt, S., Karkkainen, J.: Better filtering with a gapped q-grams. Fundamenta Informaticae 56(1/2) (2003) 51–70
[5] Ian H. Witten, Alistair Moffat, T.C.B.: Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold (1994)
[6] Michael Berry, Zhang Drmac, E.R.J.: Matrices, vector spaces, and information retrieval. SIAM Review 41 (1999) 335–862