[Corpora-List] converting PDFs to ASCII or text-only files without clumps
christian.chiarcos at web.de
Wed Jun 16 14:21:28 CEST 2010
Sorry for the confusion, the *more* in my mail was an artifact. No
comparison with Tika intended. It referred to the original first line of
my mail that mentioned ps2ascii, but I've removed this line because
ps2ascii is not really an option, neither for special characters nor for
the clumps problem.
> *Comment off list*
> FYI : Tika provides a XHTML representation of the input. Just for my own
> interest, could you explain why you think it is a more suitable option?
More information about the Corpora