[Corpora-List] Metrics for corpus "parseability"

Miles Osborne miles at inf.ed.ac.uk
Mon Feb 4 19:37:28 CET 2008


I must confess, the idea that a corpus can be described in terms of "parseability" sounds a little ill-founded to me. The choice of particular parsing algorithm may dictate which examples are hard to process, as will the underlying grammar etc etc.

What would be interesting (read: hard) would be to look at the work on phase transitions in 3-sat problems and the like. So, are there underlying graph-related characteristics of parsing which make certain sentences intrinsically hard to process and in particular can these characteristics be framed in a manner that was independent of the actual parser.

Miles

-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -------------- next part -------------- An HTML attachment was scrubbed... URL: https://mailman.uib.no/public/corpora/attachments/20080204/cfb1dd35/attachment.html



More information about the Corpora mailing list