Query term disambiguation for Web cross-language information retrieval using a search engine
Full text pdf formatPdf (736 KB)
Source International Workshop on Information Retrieval with Asia Languages archive
Proceedings of the fifth international workshop on on Information retrieval with Asian languages table of contents
Hong Kong, China
Pages: 25 - 32  
Year of Publication: 2000
ISBN:1-58113-300-6
Authors
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGLINK: Hypertext, Hypermedia, and Web
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM Hong Kong Chapter : ACM Hong Kong Chapter Executive Committee
Publisher
ACM Press   New York, NY, USA
Additional Information:

abstract   references   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Discussions    Find similar Articles   Review this Article  
Save this Article to a Binder    Display in BibTex Format   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/355214.355218
What is a DOI?

ABSTRACT

With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translation require parallel corpora or comparable corpora for the disambiguation of translated query terms. However, those natural language resources are not readily available. In this paper, we propose a disambiguation method for dictionary-based query translation that is independent of the availability of such scarce language resources, while achieving adequate retrieval effectiveness by utilizing Web documents as a corpus and using co-occurrence information between terms within that corpus. In the experiments, our method achieved 97% of manual translation case in terms of the average precision.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1   Kikui, G. Identifying the coding system and language of on-line documents using statistical language models. Transactions oflPSJ, 1997, 38(12), pp. 2440-2448.

2   Sugimoto, S., Maeda, A., Dartois, M., Ohta, J., Nakao, S., Sakaguchi, T. and Tabata, K. Experimental studies on an applet-based document viewer for multilingual WWW Documents -- Functional Extension of and Lessons Learned from Multilingual HTML. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL'98), Lecture Notes in Computer Science 1513, Springer-Verlag, 1998, pp. 199-214.

3   Bernard J. Jansen , Amanda Spink , Tefko Saracevic, Real life, real users, and real needs: a study and analysis of user queries on the web, Information Processing and Management: an International Journal, v.36 n.2, p.207-227, Jan.1.2000

4   Fujii, A. and Ishikawa, T. Cross-language information retrieval for technical documents. In Proceedings of the Joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999, pp. 29-37.

5   Oard, D. W. Alternative approaches for cross-language text retrieval. In Electronic Working Notes of the AAAI Symposium on Cross-Language Text and Speech Retrieval, 1997.

6   Gregory Grefenstette , G. Grefenstette, Cross-Language Information Retrieval, Kluwer Academic Publishers, Norwell, MA, 1998

7   Jian-Yun Nie , Michel Simard , Pierre Isabelle , Richard Durand, Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.74-81, August 15-19, 1999, Berkeley, California, United States

8   Maeda, A. and Uemura, S. Key technologies for multilingual information processing on WWW. In Proceedings of the Fourth International Symposium on Standardization of Multilingual Information Technology (MLIT-4), 1999, pp. 15-25.

9   Lin, C., Lin, W., Bian, G. and Chen, H. Description of the NTU Japanese-English cross-lingual information retrieval system used for NTCIR workshop. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, 1999, pp. 145-148.

10   Jang, M., Myaeng, S. H. and Park, S. Y. Using mutual information to resolve query translation ambiguities and query term weighting. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), 1999, pp. 223-229.

11   Lisa Ballesteros , W. Bruce Croft, Resolving ambiguity for cross-language retrieval, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.64-71, August 24-28, 1998, Melbourne, Australia

12   Fatiha, S., Maeda, A., Yoshikawa, M. and Uemura, S.: Integrating Dictionary-based and Statistical-based Approaches in Cross-Language Information Retrieval, IPSJ SIG Notes, 2000-DBS-121/2000-FI-Sg, 2000, pp. 61--68.

13   Ikeno, A., Murata, T., Shimohata, S. and Yamamoto, H. Machine translation using the Internet natural language resources. In Proceedings of World TELECOM99+ lnteractive99 Forum, 1999.

14   Kenneth Ward Church , Patrick Hanks, Word association norms, mutual information an lexicography, Computational Linguistics, v.16 n.1, p.22-29, Mar. 1990

15   Kitamura, M. and Matsumoto, Y. Automatic extraction of translation patterns in parallel corpora. Transactions oflPSJ, 1997, 38(4), pp. 727-736. (in Japanese)

16   Dunning, T. Accurate methods for the statisticx of surprise and coincidence. Computational Linguistics, 1993, 19(1), pp. 61-74.

17   Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., Hidaka, S. and Adachi, J. The NTCIR workshop: the first evaluation workshop on Japanese text retrieval and cross-lingual information retrieval. In Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages (1RAL '99), 1999.

18   Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H. and Asahara, M. Japanese morphological analysis system ChaSen version 2.0 manual 2nd edition. Technical Report NAIST-IS- TR99013, Nara Institute of Science and Technology, 1999.

19   Japan Electronic Dictionary Research Institute, Ltd. EDR electronic dictionary version 1.5 technical guide, Technical Report TR2-007, Japan Electronic Dictionary Research Institute, Ltd., 1996.



Peer to Peer - Readers of this Article have also read: