[Corpora-List] BNC raw text

Grzegorz Chrupała pitekus at gmail.com
Thu Sep 22 11:38:00 CEST 2005

Hi Robert,
Why don't you just extract the plain text from the marked up files? It
should be pretty trivial if you use some SGML library.
Grzegorz Chrupała ♦ pithekos.net

On 22/09/05, Robert Rittman <robert.rittman at gmail.com> wrote:

> I am working with the British National Corpus - World Edition CD-ROM. The CD

> does not contain the raw text of the 4,000+ documents. It only contains

> tagged text in SGML format (including metadata). Does anyone know where I

> can obtain the raw (untagged) text in plain text format?

> Thank you,

> Robert Rittman

> PhD Candidate

> School of Communication, Information and Library Studies

> Rutgers University




More information about the Corpora-archive mailing list