[Corpora-List] Bilingual Dictionary from Comparable Corpora

Krishnamurthy, Ramesh r.krishnamurthy at aston.ac.uk
Mon Oct 6 15:26:42 CEST 2014

Hi Javid yes, i am familiar with parallel corpora and comparable corpora. :) ...but for me, a 'dictionary' means something very different to 'an aligning tool for comparable corpora'.... :) best ramesh ________________________________ From: javid dadashkarimi [javiddadashkarimi at gmail.com] Sent: 06 October 2014 10:09 To: Krishnamurthy, Ramesh Cc: Jörg Tiedemann; corpora at uib.no Subject: Re: [Corpora-List] Bilingual Dictionary from Comparable Corpora

Hi Ramesh, ​Excuse me If I did not explain carefully,​ In Statistical Machine Translation of Cross-lingual Information Retrieval (CLIR), parallel corpora(sentence-aligned corpora) and comparable corpora (document -aligned corpora that documents are not as precisely translations of each other as the parallel corpora but they are in the same topic) are useful resources to translate queries in different languages from documents. Indeed, these tasks extract some words in target language that are translations of a source language word with different probabilities. So we have a comparable corpora that each document in the source language ​is in the same topic that some other in-the-target-language documents ​​ ( ​ ​D0s​

→ Dt1, Dt2, ..Dtk​ ) ​ ​ , ( ​ ​D ​1 s​

→ D ​'​ t1, D ​'​ t2, ..D ​'​ tk​ ) ​ , .. , ​ ( ​ ​D ​m s​

→ D ​"​ t1, D ​"​ t2, ..D ​"​ tk​ ) ​ . ​Best, Javid​

On Mon, Oct 6, 2014 at 1:44 AM, Krishnamurthy, Ramesh <r.krishnamurthy at aston.ac.uk<mailto:r.krishnamurthy at aston.ac.uk>> wrote: hi javid

i think you and i have different ideas about what a 'dictionary' is. :)

i think perhaps you just want to find 'word/phrase-equivalents' in comparable corpora in different languages?

i don't know enough about computational linguistics, but i *suspect* that both SketchEngine and Tshwanelex are for 'fuller' dictionaries, eg with collocational, grammatical, semantic, phraseological info, etc for each entry.... but they can probably be used with a bilingual lookup (eg Wordnet) to link items in the comparable corpora...?

best ramesh

________________________________ From: Jörg Tiedemann [Jorg.Tiedemann at lingfil.uu.se<mailto:Jorg.Tiedemann at lingfil.uu.se>] Sent: 06 October 2014 09:02 To: javid dadashkarimi Cc: Krishnamurthy, Ramesh; corpora at uib.no<mailto:corpora at uib.no> Subject: Re: [Corpora-List] Bilingual Dictionary from Comparable Corpora

Maybe you want to have a look at alignment tools for comparable corpora such as: - http://www.accurat-project.eu - http://yalign.machinalis.com

I haven't used these tools myself but I would be interested to hear if they work for you.

Good luck! Jörg


Jörg Tiedemann jorg.tiedemann at lingfil.uu.se<mailto:jorg.tiedemann at lingfil.uu.se><mailto:jorg.tiedemann at lingfil.uu.se<mailto:jorg.tiedemann at lingfil.uu.se>>

Dep. of Linguistics and Philology http://stp.lingfil.uu.se/~joerg/

Uppsala University tel: +46 (0)18 - 471 1412

Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094

On Oct 5, 2014, at 7:00 PM, javid dadashkarimi wrote:

Dear Ramesh, I only want to extract dictionary within an aligned bilingual corpus. I know that Moses can do it for parallel and sentence-level aligned corpus, but are the tools like SketchEngine or Tshwanelex extracting such a knowledge? Best, Javid

On Sun, Oct 5, 2014 at 7:23 PM, Krishnamurthy, Ramesh <r.krishnamurthy at aston.ac.uk<mailto:r.krishnamurthy at aston.ac.uk><mailto:r.krishnamurthy at aston.ac.uk<mailto:r.krishnamurthy at aston.ac.uk>>> wrote: hi javid not sure quite what you want, but i'd suggest contacting the people at SketchEngine http://www.sketchengine.co.uk/ and Tshwanelex http://tshwanedje.com/tshwanelex/ best ramesh ------------- Date: Sat, 4 Oct 2014 15:11:02 +0330 From: javid dadashkarimi <javiddadashkarimi at gmail.com<mailto:javiddadashkarimi at gmail.com><mailto:javiddadashkarimi at gmail.com<mailto:javiddadashkarimi at gmail.com>>> Subject: [Corpora-List] Bilingual Dictionary from Comparable Corpora To: corpora at uib.no<mailto:corpora at uib.no><mailto:corpora at uib.no<mailto:corpora at uib.no>>, gate-users-request at lists.sourceforge.net<mailto:gate-users-request at lists.sourceforge.net><mailto:gate-users-request at lists.sourceforge.net<mailto:gate-users-request at lists.sourceforge.net>>

Hi, Is there any tool for extracting probabilistic bilingual dictionary for a bilingual comparable corpora? Does Moses support such a task? Best, Javid

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no><mailto:Corpora at uib.no<mailto:Corpora at uib.no>> http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list