[Corpora-List] Bilingual Dictionary from Comparable Corpora

javid dadashkarimi javiddadashkarimi at gmail.com
Mon Oct 6 11:09:53 CEST 2014


Hi Ramesh, ​Excuse me If I did not explain carefully,​ In Statistical Machine Translation of Cross-lingual Information Retrieval (CLIR), parallel corpora(sentence-aligned corpora) and comparable corpora (document -aligned corpora that documents are not as precisely translations of each other as the parallel corpora but they are in the same topic) are useful resources to translate queries in different languages from documents. Indeed, these tasks extract some words in target language that are translations of a source language word with different probabilities. So we have a comparable corpora that each document in the source language ​is in the same topic that some other in-the-target-language documents ​​ ( ​ ​D0s​

→ Dt1, Dt2, ..Dtk​ ) ​ ​ , ( ​ ​D ​1 s​

→ D ​'​ t1, D ​'​ t2, ..D ​'​ tk​ ) ​ , .. , ​ ( ​ ​D ​m s​

→ D ​"​ t1, D ​"​ t2, ..D ​"​ tk​ ) ​ . ​Best, Javid​

On Mon, Oct 6, 2014 at 1:44 AM, Krishnamurthy, Ramesh < r.krishnamurthy at aston.ac.uk> wrote:


> hi javid
>
> i think you and i have different ideas about what a 'dictionary' is. :)
>
> i think perhaps you just want to find 'word/phrase-equivalents' in
> comparable corpora in
> different languages?
>
> i don't know enough about computational linguistics, but i *suspect*
> that both SketchEngine and Tshwanelex are for 'fuller' dictionaries,
> eg with collocational, grammatical, semantic, phraseological info, etc
> for each entry.... but they can probably be used with a bilingual lookup
> (eg Wordnet) to link items in the comparable corpora...?
>
> best
> ramesh
>
>
>
> ________________________________
> From: Jörg Tiedemann [Jorg.Tiedemann at lingfil.uu.se]
> Sent: 06 October 2014 09:02
> To: javid dadashkarimi
> Cc: Krishnamurthy, Ramesh; corpora at uib.no
> Subject: Re: [Corpora-List] Bilingual Dictionary from Comparable Corpora
>
>
> Maybe you want to have a look at alignment tools for comparable corpora
> such as:
> - http://www.accurat-project.eu
> - http://yalign.machinalis.com
>
> I haven't used these tools myself but I would be interested to hear if
> they work for you.
>
> Good luck!
> Jörg
>
>
> **********************************************************************************
> Jörg Tiedemann
> jorg.tiedemann at lingfil.uu.se<mailto:jorg.tiedemann at lingfil.uu.se>
> Dep. of Linguistics and Philology
> http://stp.lingfil.uu.se/~joerg/
> Uppsala University tel: +46 (0)18 - 471
> 1412
> Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094
>
>
>
> On Oct 5, 2014, at 7:00 PM, javid dadashkarimi wrote:
>
> Dear Ramesh,
> I only want to extract dictionary within an aligned bilingual corpus. I
> know that Moses can do it for parallel and sentence-level aligned corpus,
> but are the tools like SketchEngine or Tshwanelex extracting such a
> knowledge?
> Best,
> Javid
>
> On Sun, Oct 5, 2014 at 7:23 PM, Krishnamurthy, Ramesh <
> r.krishnamurthy at aston.ac.uk<mailto:r.krishnamurthy at aston.ac.uk>> wrote:
> hi javid
> not sure quite what you want,
> but i'd suggest contacting the
> people at SketchEngine
> http://www.sketchengine.co.uk/
> and Tshwanelex
> http://tshwanedje.com/tshwanelex/
> best
> ramesh
> -------------
> Date: Sat, 4 Oct 2014 15:11:02 +0330
> From: javid dadashkarimi <javiddadashkarimi at gmail.com<mailto:
> javiddadashkarimi at gmail.com>>
> Subject: [Corpora-List] Bilingual Dictionary from Comparable Corpora
> To: corpora at uib.no<mailto:corpora at uib.no>,
> gate-users-request at lists.sourceforge.net<mailto:
> gate-users-request at lists.sourceforge.net>
>
> Hi,
> Is there any tool for extracting probabilistic bilingual dictionary for a
> bilingual comparable corpora? Does Moses support such a task?
> Best,
> Javid
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no<mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8631 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20141006/6dd1daef/attachment.txt>



More information about the Corpora mailing list