[Corpora-List] pdfs/ OCR question

Hunter, Duncan D.I.Hunter at warwick.ac.uk
Mon Dec 11 15:47:00 CET 2006


Quick question about pdfs/ OCR:

Some text is copied and from a pdf file and pasted into a text or Word file. It contains errors- say, for example, 'the' has become 'die' (you notice that in the original pdf the 't' and 'h' are quite close together). At what stage has this misrecognition/ miscopying occured?
Where does the OCR take place? The OCR functionality is, presumably, part of of the .pdf reader software itself?

Can anything be done to deal with the problem?

Duncan Hunter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.uib.no/public/corpora-archive/attachments/20061211/41e8010c/attachment.html


More information about the Corpora-archive mailing list