Maybe be this will come in handy as well: http://poi.apache.org/
> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Tino
> Sent: 09 February 2012 13:08
> To: Josep M. Fontana
> Cc: corpora at uib.no
> Subject: Re: [Corpora-List] Tools for batch conversion Word to UTF-8.
> Modern MS Word .docx files are ZIPs with XML documents, which don't require
> much scripting to extract plain text from.
> Older .doc files will need a trip through some tool. It is possible to use
> OpenOffice/LibreOffice in headless mode for this, and OOo/LO's Office reader gets
> most of the doc format right.
> -- Tino Didriksen
> On Thu, Feb 9, 2012 at 12:38, Josep M. Fontana <josepm.fontana at upf.edu> wrote:
> Does anyone here know of a good free application to batch convert Word
> documents to UTF-8? (Linux, OS X or Windows, it doesn't matter)