|
|
|||||||||||||||||||
|
|||||||||||||||||||
ABSTRACT
With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translation require parallel corpora or comparable corpora for the disambiguation of translated query terms. However, those natural language resources are not readily available. In this paper, we propose a disambiguation method for dictionary-based query translation that is independent of the availability of such scarce language resources, while achieving adequate retrieval effectiveness by utilizing Web documents as a corpus and using co-occurrence information between terms within that corpus. In the experiments, our method achieved 97% of manual translation case in terms of the average precision. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references. 1 Kikui, G. Identifying the coding system and language of on-line documents using statistical language models. Transactions oflPSJ, 1997, 38(12), pp. 2440-2448. 2 Sugimoto, S., Maeda, A., Dartois, M., Ohta, J., Nakao, S., Sakaguchi, T. and Tabata, K. Experimental studies on an applet-based document viewer for multilingual WWW Documents -- Functional Extension of and Lessons Learned from Multilingual HTML. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL'98), Lecture Notes in Computer Science 1513, Springer-Verlag, 1998, pp. 199-214. 4 Fujii, A. and Ishikawa, T. Cross-language information retrieval for technical documents. In Proceedings of the Joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999, pp. 29-37. 5 Oard, D. W. Alternative approaches for cross-language text retrieval. In Electronic Working Notes of the AAAI Symposium on Cross-Language Text and Speech Retrieval, 1997. 8 Maeda, A. and Uemura, S. Key technologies for multilingual information processing on WWW. In Proceedings of the Fourth International Symposium on Standardization of Multilingual Information Technology (MLIT-4), 1999, pp. 15-25. 9 Lin, C., Lin, W., Bian, G. and Chen, H. Description of the NTU Japanese-English cross-lingual information retrieval system used for NTCIR workshop. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, 1999, pp. 145-148. 10 Jang, M., Myaeng, S. H. and Park, S. Y. Using mutual information to resolve query translation ambiguities and query term weighting. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), 1999, pp. 223-229. 12 Fatiha, S., Maeda, A., Yoshikawa, M. and Uemura, S.: Integrating Dictionary-based and Statistical-based Approaches in Cross-Language Information Retrieval, IPSJ SIG Notes, 2000-DBS-121/2000-FI-Sg, 2000, pp. 61--68. 13 Ikeno, A., Murata, T., Shimohata, S. and Yamamoto, H. Machine translation using the Internet natural language resources. In Proceedings of World TELECOM99+ lnteractive99 Forum, 1999. 15 Kitamura, M. and Matsumoto, Y. Automatic extraction of translation patterns in parallel corpora. Transactions oflPSJ, 1997, 38(4), pp. 727-736. (in Japanese) 16 Dunning, T. Accurate methods for the statisticx of surprise and coincidence. Computational Linguistics, 1993, 19(1), pp. 61-74. 17 Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., Hidaka, S. and Adachi, J. The NTCIR workshop: the first evaluation workshop on Japanese text retrieval and cross-lingual information retrieval. In Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages (1RAL '99), 1999. 18 Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H. and Asahara, M. Japanese morphological analysis system ChaSen version 2.0 manual 2nd edition. Technical Report NAIST-IS- TR99013, Nara Institute of Science and Technology, 1999. 19 Japan Electronic Dictionary Research Institute, Ltd. EDR electronic dictionary version 1.5 technical guide, Technical Report TR2-007, Japan Electronic Dictionary Research Institute, Ltd., 1996. Peer to Peer - Readers of this Article have also read:
|
|||||||||||||||||||