[Corpora-List] Invitation-based data selection tool for bilingual domain adaptation

Kamran, A. A.Kamran at uva.nl
Tue Jun 23 00:02:08 CEST 2015


Dear All,

I implemented a data selection tool for domain adaptation based on Invitation Model as described in: Hoang, Cuong and Sima'an, Khalil (2014): Latent Domain Translation Models in Mix-of-Domains Haystack, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, http://www.aclweb.org/anthology/C14-1182.pdf

The developed tool is available at the following Github repository:

https://github.com/amirkamran/InvitationModel

Invitation based data selection approach exploits in-domain data (both monolingual and bilingual) as prior to guide word alignment and phrase pair estimates in the large mix-domain corpus. As a by-product, accurate estimates for P(D|e,f) of the mixed-domain sentences are produced (with D being either in-domain or out-of-domain), which can be used to rank the sentences in mix-domain according to their relevance to in-domain corpus.

This work has been conducted at ILLC (Institute for Logic, Language and Computation, University of Amsterdam) https://www.illc.uva.nl<https://www.illc.uva.nl/> as part of the project "Data-Powered Domain-Specific Translation Services On Demand", supported by the grant "STW Open Technologieprogramma".

Regards Amir Kamran Research Programmer Institute of Logic, Language and Computation (ILLC) University of Amsterdam -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5040 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150622/aeeb43d6/attachment.txt>



More information about the Corpora mailing list