[Corpora-List] English on-line sentence and word tokenization

WHITELOCK, Pete pete.whitelock at oup.com
Thu Apr 10 11:29:26 CEST 2014


I tried a search for "incremental tokenization" and found this:

http://www.english-linguistics.de/fr/teaching/ws09-10/i2cl/slides/lecture10.pdf

I think it's relevant - maybe you can find more detail in Frank Richter's papers.

Pete Whitelock, PhD Principal Language Engineer, Technology Academic Dictionaries Oxford University Press Gt. Clarendon St. OX2 6DP United Kingdom

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Phil Gooch Sent: 10 April 2014 10:15 To: Hugo Mougard Cc: corpora at uib.no <corpora at uib.no> <corpora at uib.no> Subject: Re: [Corpora-List] English on-line sentence and word tokenization

I think Clinithink does something along these lines, but it is a commercial product

http://clinithink.com/

Phil

On Thu, Apr 10, 2014 at 9:59 AM, Hugo Mougard <mog at crydee.eu<mailto:mog at crydee.eu>> wrote: Dear all,

I'm looking for any pointers on works handling on-line tokenization (especially at the sentence level but word level also interests me). By on-line I mean "while the text is being typed". My current exploration gave no interesting result, likely because on-line is mainly used for something different than the above definition (eg being on the internet).

Best, Hugo

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> http://mailman.uib.no/listinfo/corpora

Oxford University Press (UK) Disclaimer

This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. OUP does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of OUP. If this email has come to you in error, please delete it, along with any attachments. Please note that OUP may intercept incoming and outgoing email communications. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6371 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20140410/bc35a11f/attachment.txt>



More information about the Corpora mailing list