Hi Javid yes, i am familiar with parallel corpora and comparable corpora. :) ...but for me, a 'dictionary' means something very different to 'an aligning tool for comparable corpora'.... :) best ramesh ________________________________ From: javid dadashkarimi [javiddadashkarimi at gmail.com] Sent: 06 October 2014 10:09 To: Krishnamurthy, Ramesh Cc: Jörg Tiedemann; corpora at uib.no Subject: Re: [Corpora-List] Bilingual Dictionary from Comparable Corpora

Hi Ramesh, ​Excuse me If I did not explain carefully,​ In Statistical Machine Translation of Cross-lingual Information Retrieval (CLIR), parallel corpora(sentence-aligned corpora) and comparable corpora (document -aligned corpora that documents are not as precisely translations of each other as the parallel corpora but they are in the same topic) are useful resources to translate queries in different languages from documents. Indeed, these tasks extract some words in target language that are translations of a source language word with different probabilities. So we have a comparable corpora that each document in the source language ​is in the same topic that some other in-the-target-language documents ​​ ( ​ ​D0s​

→ Dt1, Dt2, ..Dtk​ ) ​ ​ , ( ​ ​D ​1 s​

→ D ​'​ t1, D ​'​ t2, ..D ​'​ tk​ ) ​ , .. , ​ ( ​ ​D ​m s​

→ D ​"​ t1, D ​"​ t2, ..D ​"​ tk​ ) ​ . ​Best, Javid​

i think you and i have different ideas about what a 'dictionary' is. :)

i think perhaps you just want to find 'word/phrase-equivalents' in comparable corpora in different languages?

i don't know enough about computational linguistics, but i *suspect* that both SketchEngine and Tshwanelex are for 'fuller' dictionaries, eg with collocational, grammatical, semantic, phraseological info, etc for each entry.... but they can probably be used with a bilingual lookup (eg Wordnet) to link items in the comparable corpora...?

Maybe you want to have a look at alignment tools for comparable corpora such as: - http://www.accurat-project.eu - http://yalign.machinalis.com

I haven't used these tools myself but I would be interested to hear if they work for you.

Good luck! Jörg


Dear Ramesh, I only want to extract dictionary within an aligned bilingual corpus. I know that Moses can do it for parallel and sentence-level aligned corpus, but are the tools like SketchEngine or Tshwanelex extracting such a knowledge? Best, Javid

Hi, Is there any tool for extracting probabilistic bilingual dictionary for a bilingual comparable corpora? Does Moses support such a task? Best, Javid

Hi, Is there any tool for extracting probabilistic bilingual dictionary for a bilingual comparable corpora? Does Moses support such a task? Best, Javid

