[Corpora-List] SCANNED TEXTS ARE VALID FOR CORPORA PURPOSES?

Angus Grieve-Smith grvsmth at panix.com
Fri Aug 1 17:43:11 CEST 2008


On Thu, 31 Jul 2008, J.L. DeLucca wrote:


> In the digital world there are the digital libraries like the " Gallica,
> Bibliothèque nationale de France digital library "that works with
> scanned texts NO OCR treatment or the Ebook projects that works wirh
> full texts. well,I want to know if you would consider scanned texts NO
> OCR treatment as digital corpora, especially oldest texts.

I think that the term "digital corpus" implies searchability and taggability. In that sense, these texts are not digital corpora. That said, they are certainly rich sources of texts that can often be OCRed into digital corpora. I have used many of the Gallica texts in this way for my current project.

-Angus B. Grieve-Smith

Linguistics Department

University of New Mexico

grvsmth at unm.edu

grvsmth at panix.com



More information about the Corpora mailing list