[Corpora-List] Metrics for corpus "parseability"

Sandra Kuebler skuebler at indiana.edu
Mon Feb 4 23:27:56 CET 2008


There is related work about the ambiguity of grammars induced from treebanks. Anna Corazza, Alberto Lavelli, and Giorgio Satta used conditional cross entropy for that. This may help to at least abstract away from the parser :)

Sandra

On Feb 4, 2008, at 5:21 PM, Miles Osborne wrote:


> Chris Brew suggested I actually explain what it is I meant: here
> is a sample paper on phase transitions in solving problems like 3-sat:
>
> http://www.sciencemag.org/cgi/content/abstract/264/5163/1297
>
> Props to Chris!
>
> Miles
>
> On 04/02/2008, Miles Osborne <miles at inf.ed.ac.uk> wrote:
> I must confess, the idea that a corpus can be described in terms of
> "parseability" sounds a little ill-founded to me. The choice of
> particular parsing algorithm may dictate which examples are hard to
> process, as will the underlying grammar etc etc.
>
> What would be interesting (read: hard) would be to look at the
> work on phase transitions in 3-sat problems and the like. So, are
> there underlying graph-related characteristics of parsing which
> make certain sentences intrinsically hard to process and in
> particular can these characteristics be framed in a manner that was
> independent of the actual parser.
>
> Miles
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

Sandra Kuebler Indiana University Department of Linguistics Memorial Hall 322 1021 E. Third Street Bloomington IN 47405 USA phone: (812) 855-3268 fax: (812) 855-5363 email: skuebler at indiana.edu

-------------- next part -------------- An HTML attachment was scrubbed... URL: https://mailman.uib.no/public/corpora/attachments/20080204/b987c093/attachment.html



More information about the Corpora mailing list