[Corpora-List] Bilingual Dictionary from Comparable Corpora

Emad Mohamed emohamed at umail.iu.edu
Sun Oct 5 23:51:50 CEST 2014


Hi Javid, If you have a parallel corpus, you can use an alignment tool such as Giza++ or fast-aligner to get word alignments. You can run the alignment in both directions to get a bi-directional dictionary. If your corpus is comparable (which I understand as not completely parallel, but I may be wrong here), then a tool like hungalign can help extract the parallel sentences, which you can then use for building your dictionary. HTH, Emad

On Sun, Oct 5, 2014 at 1:00 PM, javid dadashkarimi < javiddadashkarimi at gmail.com> wrote:


> Dear Ramesh,
> I only want to extract dictionary within an aligned bilingual corpus. I
> know that Moses can do it for parallel and sentence-level aligned corpus,
> but are the tools like SketchEngine or Tshwanelex extracting such a
> knowledge?
> Best,
> Javid
>
> On Sun, Oct 5, 2014 at 7:23 PM, Krishnamurthy, Ramesh <
> r.krishnamurthy at aston.ac.uk> wrote:
>
>> hi javid
>> not sure quite what you want,
>> but i'd suggest contacting the
>> people at SketchEngine
>> http://www.sketchengine.co.uk/
>> and Tshwanelex
>> http://tshwanedje.com/tshwanelex/
>> best
>> ramesh
>> -------------
>> Date: Sat, 4 Oct 2014 15:11:02 +0330
>> From: javid dadashkarimi <javiddadashkarimi at gmail.com>
>> Subject: [Corpora-List] Bilingual Dictionary from Comparable Corpora
>> To: corpora at uib.no, gate-users-request at lists.sourceforge.net
>>
>> Hi,
>> Is there any tool for extracting probabilistic bilingual dictionary for a
>> bilingual comparable corpora? Does Moses support such a task?
>> Best,
>> Javid
>>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

-- Emad Mohamed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3304 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20141005/08a190b2/attachment.txt>



More information about the Corpora mailing list