[Corpora-List] corpora list crawler

Tasty Minerals tastyminerals at gmail.com
Sat Jun 11 13:11:56 CEST 2016


Dear CL, Some time ago I needed to search for specific corpus in CL. Manual look-up took a lot of time, so I wrote a simple "ccrawl" script to search through corpora archives. The script first attempts to sync with current CL and stores the retrieved data locally, after that you can search CL by: "python2 ccrawl.py -f corpus"

The script can search through CL threads as well as emails (if deep sync was used). It is also able to index older CL archives dating back to 1995 year. Hope it comes in handy.

More details: Git: https://github.com/tastyminerals/ccrawl

CL archives 2004-2016: http://mailman.uib.no//public/corpora/ older CL archives 1995-2004: http://clu.uni.no/corpora/old.html

Best, Pavel -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1092 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160611/3c3befa6/attachment.txt>



More information about the Corpora mailing list