[Corpora-List] "Multi-encoded" corpora
martin.wynne at oucs.ox.ac.uk
Wed Oct 8 13:44:27 CEST 2008
Albretch Mueller wrote:
> I was browsing around the BAWE corpus info previously posted here and
> when I noticed all texts are in PDF format (!), it made me wonder...
Oh no, they're not! The corpus is composed text files, with a choice of
text encodings. None of it is in PDF files. There is some prose
documentation in PDF files to accompany the corpus in the package of
files which can be downloaded from the OTA.
More information about the Corpora