maxwell at umiacs.umd.edu
Tue Apr 24 18:03:52 CEST 2012
On Tue, 24 Apr 2012 17:16:15 +0200, Serge Heiden <slh at ens-lyon.fr> wrote:
> Have a look at TXM 0.6 (free AND open-source):
> https://sourceforge.net/projects/txm 
> It handles right-to-left writing systems display*.
> You can check in the demo portal:
> http://txm.risc.cnrs.fr/demo/?locale=en 
> in which 'ONUAR' is a small sample UNO based arabic texts corpus.
> (build a lexicon, double-clic on a word line then double-clic
> on a KWIC line to get to the text edition)
> (*) Even if this is far from perfect (let alone the bad arabic
> tokenization, etc.).
> This is done nearly automagically by the technology we use
> or Java Eclipse RCP/SWT for the desktop version)
In case anyone else is working on the Dhivehi language, there's a bug in
Java which (as far as we have been able to discover) prevents proper
rendering of the Thaana script used for Dhivehi. Thaana has long had a
Unicode block, but Java seems not to recognize that Thaana is written
right-to-left. The bug does not affect Arabic script. I've never checked
about other right-to-left scripts, like Syriac.
University of Maryland
More information about the Corpora