[Corpora-List] "Multi-encoded" corpora

Martin Wynne martin.wynne at oucs.ox.ac.uk
Wed Oct 8 13:44:27 CEST 2008


Albretch Mueller wrote:
> ~
> I was browsing around the BAWE corpus info previously posted here and
> when I noticed all texts are in PDF format (!), it made me wonder...

Oh no, they're not! The corpus is composed text files, with a choice of text encodings. None of it is in PDF files. There is some prose documentation in PDF files to accompany the corpus in the package of files which can be downloaded from the OTA.

Martin



More information about the Corpora mailing list