[Corpora-List] Q: Hyphenation removal

Angus Grieve-Smith grvsmth at panix.com
Fri Aug 17 17:39:28 CEST 2012

On 8/16/2012 7:37 AM, Roland Schäfer wrote:
> are there any tools to remove hard-coded "hyphe- nation" from texts (or
> papers describing principled solutions to the problem).

I'm sure that there's something out there and that someone on the list will know where to find it.

I don't know about German, but in English there is significant ambiguity. There are many instances where a hyphen is optional. Fortunately for your purpose, I believe that the differences in meaning are small enough that in those cases you could probably remove all the hyphens. Some are even typographically motivated, such as "antiinflamatory," which exists but is used less often than "anti-inflammatory" because people seem to be uncomfortable writing two "i"s in the middle of a word in English.

Maybe someone with more experience in this area can elaborate.


-Angus B. Grieve-Smith

grvsmth at panix.com

