[Corpora-List] Keywords Generator (fwd)

Jason Baldridge jbaldrid at mail.utexas.edu
Mon Feb 18 17:12:39 CET 2008


Check out the website for the "Words in a Haystack" course that Katrin Erk taught in Fall 2007 on methods and tools for working with corpora (and using a lot of Python to do it):

http://comp.ling.utexas.edu/courses/2007/corpora07/schedule.html

See the slides and links on the schedule page.

Also, we have a wiki page for Python tips (both simple and more advanced stuff):

http://comp.ling.utexas.edu/wiki/doku.php/python_tips

Jason

On Feb 18, 2008 10:03 AM, True Friend <true.friend2004 at gmail.com> wrote:


> Thnx I'd like to learn something specially Python related stuff.
>
>
> On Feb 18, 2008 8:16 PM, Jason Baldridge <jbaldrid at mail.utexas.edu> wrote:
>
> > If you'd like to learn more detail about the *nix commands and learn how
> > to roll your own, check out Chapter 3 of Chris Brew and Mark Moens book
> > draft:
> > http://www.ling.ohio-state.edu/~cbrew/2007/spring/684.02/dilbook.pdf<http://www.ling.ohio-state.edu/%7Ecbrew/2007/spring/684.02/dilbook.pdf>
> >
> > We also have a tips and tricks wiki for UT Austin's compling lab that
> > includes some notes on Unix commands:
> >
> >
> > http://comp.ling.utexas.edu/wiki/doku.php/tips_and_tricks#handy_unix_commands
> >
> > Also, on a related note, we put Peyton Todd's corpus linguistics
> > compilation (posted to corpora list some time ago) on our wiki and added to
> > it:
> >
> > http://comp.ling.utexas.edu/wiki/doku.php/corpus_linguistics
> >
> > Others are welcome to add to the wiki if they wish.
> >
> > Jason
> >
> >
> > On Feb 18, 2008 8:44 AM, Trevor Jenkins <trevor.jenkins at suneidesis.com>
> > wrote:
> >
> > > On Mon, 18 Feb 2008, True Friend <true.friend2004 at gmail.com> asked for
> > > help:
> > >
> > > Antconc has a word frequency count feature. Why not use that?
> > >
> > > Ben Allison has given you a UNIX solution. Here's mine
> > >
> > > tr "[:space:]" "\n" <Sense\ and\ Sensibility.txt|tr "[:upper:]"
> > > "[:lower:]"|tr -d "[:punct:]"|sort|uniq -c|sort > SS-list
> > >
> > > Change "Sense\ and\ Sensibility.txt" and "SS-list" to what ever your
> > > own
> > > files are call. You can tell what I've been playing with recently. ;-)
> > >
> > > The difference between mine and Ben's is mine relies solely upon
> > > standard
> > > filters that should be available on every UNIX machine. You might not
> > > have
> > > Perl installed, which is required by Ben's version. Of course, you
> > > might
> > > not have the GNU version of textutils, which I'm relying upon. We're
> > > both
> > > sorting on ascending frequency.
> > >
> > > > Hi Folks
> > > I need a a programm/script (even of *nix) that can provide frequency
> > > of a
> > > wordlist from two corpora. Actually I have made this list by comparing
> > > two
> > > word lists one from general english (specifically from Pakistani
> > > Origin) and
> > > law english (also of Pakistani origin). I know want to present these
> > > keywords with their frequencies in both corpora as a proof that these
> > > words
> > > are more frequent in law. Keywords are generated by Antconc.
> > > Is there any script/tool that can generate a parallel list of
> > > frequencies of
> > > each word in both corpora?
> > > Regards
> > > M Shakir Aziz
> > > A Corpus Linguistics Student
> > > Pakistan
> > >
> > > --
> > > محمد شاکر عزیز
> > >
> > >
> > > Regards, Trevor
> > >
> > > <>< Re: deemed!
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Corpora mailing list
> > > Corpora at uib.no
> > > http://mailman.uib.no/listinfo/corpora
> > >
> >
> >
> >
> > --
> > Jason Baldridge
> > Assistant Professor, Department of Linguistics
> > The University of Texas at Austin
> > http://comp.ling.utexas.edu/jbaldrid
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
> >
>
>
> --
> محمد شاکر عزیز
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

-- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://comp.ling.utexas.edu/jbaldrid -------------- next part -------------- An HTML attachment was scrubbed... URL: https://mailman.uib.no/public/corpora/attachments/20080218/34d7b631/attachment.html



More information about the Corpora mailing list