Xiao, Zhonghua z.xiao at lancaster.ac.uk
Mon Feb 16 15:14:30 CET 2009

Here is a recent chapter -

McEnery, Tony and Richard Xiao (2007) Parallel and comparable corpora: The state of play. In Y. Kawaguchi, T. Takagaki, N. Tomimori and Y. Tsuruga (eds.) Corpus-Based Perspectives in Linguistics. Amsterdam: John Benjamins. 131-145.


From: corpora-bounces at uib.no on behalf of Helena Blancafort Sent: Mon 16/02/2009 12:46 To: CORPORA at UIB.NO Subject: Re: [Corpora-List] STATE OF THE ART IN COMPARABLE CORPORA

> I would be grateful for any up-to-date information about state > of the
> art in comparable corpus.


Here some articles about comparable corpora that should be useful for a state-of-the-art.

Helena Blancafort ------------------ Syllabs www.syllabs.com

Déjean, H., Gaussier, E. (2002). "Une nouvelle approche à l'extraction de lexiques bilingues à partir de corpus comparables". Lexicometrica, Alignement lexical dans les corpus multilingues, pp. 1-21.

Sadat, F., Yoshikawa, M. et Uemura, S. (2003). "Learning Bilingual Translations from Comparable Corpora to Cross-Language Information Retrieval: Hybrid Statistics-based and Linguistics-based Approach". Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11, pages 57-64.

E. MORIN et B. DAILLE (2004). Extraction de terminologies bilingues à partir de corpus comparables d'un domaine spécialisé. Traitement Automatique des Langues (TAL) , 45:3, Hermès Lavoisier Sciences Publications, 2004. ISSN 1248-9433.

E. Morin, B. Daille, K. Takeuchi, and K. Kageura (2007). Bilingual Terminology Mining -- Using Brain, not brawn comparable corpora. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07) p. 664-671, Prague, Czech Republic, 2007. On line Proceedings

B. DAILLE and E. MORIN. (2005) French-English terminology extraction from comparable corpora. In Proceedings IJCNLP 2005: Second International Joint Conference, Lecture Notes in Computer Sciences, vol. 3651/2005, p. 707-719, Springer-Verlag, 2005. ISBN 3-540-29172.

Yun-Chuang Chiao and Pierre Zweigenbaum. 2002. Looking for candidate translational equivalents in spe- cialized, comparable corpora. In Proceedings of the 19th International Conference on Computational Lin- guistics (COLING'02), pages 1208-1212, Tapei, Tai- wan.

Pascale Fung. 1998. A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non- parallel Corpora. In David Farwell, Laurie Gerber, and Eduard Hovy, editors, Proceedings of the 3rd Con- ference of the Association for Machine Translation in the Americas (AMTA'98), pages 1-16, Langhorne, PA, USA. Springer.

Carol Peters and Eugenio Picchi. 1998. Cross-language information retrieval: A system for comparable cor- pus querying. In Gregory Grefenstette, editor, Cross- language information retrieval, chapter 7, pages 81- 90. Kluwer. Reinhard Rapp. 1999. Automatic Identification of Word Translations from Unrelated English and German Cor- pora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), pages 519-526, college Park, Maryland, USA.

Gamallo P., and J-R. Pichel (2008) "Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary", Lecture Notes in Computer Science, vol. 4919, Springer-Verlag, (423-433). ISNN: 0302-9743.

Gamallo P. (2008) "Evaluating two different methods for the task of extracting bilingual lexicons from comparable corpora", In Proceedings of LREC 2008 Workshop on Comparable Corpora, Marrakech, Marroco, pp. 19-26. ISBN: 2-9517408-4-0.

Gamallo P. (2007) "Learning Bilingual Lexicons from Comparable English and Spanish Corpora", In Proceedings of Machine Translation Summit XI, Copenhagen, Denmark, pp. 191-198.

Gamallo P. and J.R. Pichel (2007) "Un método de extracción de equivalentes de traducción a partir de un corpus comparable castellano-gallego", Procesamiento del Lenguaje Natural, 39, pp. 241-248.

> -----Message d'origine-----
> De : corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] De la part de
> Eric Atwell
> Envoyé : lundi 16 février 2009 09:54
> À : J.L. DeLucca
> Cc : CORPORA at uib.no
> For large comparable corpora for English/Russian,
> and links to other large comparable corpora and relevant publications, see
> http://corpus.leeds.ac.uk/
> I hope this is helpful
> Eric Atwell, Leeds University
> On Sun, 15 Feb 2009, J.L. DeLucca wrote:
> > Dear All,
> >
> > I would be grateful for any up-to-date information about state of the
> > art in comparable corpus.
> >
> > Best regards.
> > J. L. De Lucca
> > Universidad Politécnica de Valencia
> > Departamento de Linguistica Aplicada
> >
> >
> >
> >
> --
> Eric Atwell,
> Senior Lecturer, Language research group, School of Computing,
> Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
> TEL: 0113-3435430 FAX: 0113-3435468 WWW/email: google Eric Atwell

___________________________________________________________ Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de <http://messenger.yahoo.de/>

_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list