[Corpora-List] Request

maxwell maxwell at umiacs.umd.edu
Tue Apr 24 18:03:52 CEST 2012


On Tue, 24 Apr 2012 17:16:15 +0200, Serge Heiden <slh at ens-lyon.fr> wrote:
> Have a look at TXM 0.6 (free AND open-source):
> https://sourceforge.net/projects/txm [1]
> It handles right-to-left writing systems display*.
> You can check in the demo portal:
> http://txm.risc.cnrs.fr/demo/?locale=en [2]
> in which 'ONUAR' is a small sample UNO based arabic texts corpus.
> (build a lexicon, double-clic on a word line then double-clic
> on a KWIC line to get to the text edition)
>
> Best,
> Serge
> ____________________
>
> (*) Even if this is far from perfect (let alone the bad arabic
> tokenization, etc.).
> This is done nearly automagically by the technology we use
> (Java+Javascript GWT
> or Java Eclipse RCP/SWT for the desktop version)

In case anyone else is working on the Dhivehi language, there's a bug in Java which (as far as we have been able to discover) prevents proper rendering of the Thaana script used for Dhivehi. Thaana has long had a Unicode block, but Java seems not to recognize that Thaana is written right-to-left. The bug does not affect Arabic script. I've never checked about other right-to-left scripts, like Syriac.

Mike Maxwell

University of Maryland



More information about the Corpora mailing list