In practice, the best next step is to find a friend who is good with Python, Perl, Ruby or another good text processing tool that handles regular expressions. Force your friend to sit down with you and take a very detailed look at precisely what the corpus transcription you are working with is like, then devise a regular expression that catches most of the boundaries you want. The result will probably be highly tied to the specifics of your corpus, and will probably not be perfect, but it will be a start.
On 21/02/2008, Su Qi Apple <applesuqi at yahoo.co.uk> wrote:
> Dear All
> I am just beginning my study in corpus linguistics and in a corpus of
> spoken English in particular. I want to ask if someone can tell me if you
> know of any tagging programs that can indicate C-units as opposed to
> I look forward to your replies.
> Apple Su Qi
> Sent from Yahoo!<http://us.rd.yahoo.com/mailuk/taglines/isp/control/*http://us.rd.yahoo.com/evt=51949/*http://uk.docs.yahoo.com/mail/winter07.html>- a smarter inbox.
> Corpora mailing list
> Corpora at uib.no
-------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.uib.no/mailman/public/corpora/attachments/20080221/a533770f/attachment.html