On Thu, Feb 2, 2012 at 10:25 AM, Karen Fort <karen.fort at inist.fr> wrote:
> Hi all,
> I could not find the time to precise my question and then received a lot of
> very interesting answers and references.
> Thank you all for this!
> In fact, I should have said that I'm looking for the number of ambiguous
> word tokens in terms of POS in an English corpus, for example from the Penn
> TreeBank. One solution would be to compute this myself from the Brown
> corpus, but I was curious if there was a ref. on this.
> I found this ref for French that says 60% of the French tokens in their
> corpus were non ambiguous in terms of POS:
> Tzoukermann, E.; Radev, D. R. & Gale, W. A. Ken Church, Susan Armstrong, P.
> I. E. T. & Yarowsky, D. (ed.) Natural Language Processing Using Very Large
> Corpora Tagging french without lexical probabilities -- combining linguistic
> knowledge and statistical learning Kluwer Academic, 1999
> Of course, it all depends on the number of tags, their refinement et so on.
> It only gives a very rough idea and should be taken in its context,
> obviously. But that's all I need.
> Le 26/01/2012 10:39, Eckhard Bick a écrit :
>> Hello again,
>> I forgot to add, that the ambiguous word tokens in my English test run
>> amounted to 49.8%.
>> On 2012-01-25 20:33, FORT, Karen wrote:
>>> Hi all,
>>> I need to find this information (the proportion of ambiguous words in
>>> English and their frequency).
>>> For example, we know that in French 8% of the words represent 30% of the
>>> Of course, it's very rough, but it's only to have a rough idea.
>>> Can somebody help me with this (of course, I searched for a ref but could
>>> not find anything precise)?
>>> Thank you in advance,
>>> Karën FORT
>>> Ingénieure/Engineer et/and doctorante/PhD student
>>> INIST-CNRS / LIPN
>>> 2, allée de Brabois
>>> 54500 Vandoeuvre-lès-Nancy
>>> Bureau/Office: H112
>>> +33 (0)3 83 50 46 36
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
> Karën FORT
> Ingénieure/Engineer et/and doctorante/PhD student
> INIST-CNRS / LIPN
> 2, allée de Brabois
> 54500 Vandoeuvre-lès-Nancy
> Bureau/Office: H112
> +33 (0)3 83 50 46 36
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
-- Kevin Bretonnel Cohen, PhD Biomedical Text Mining Group Lead, Computational Bioscience Program, U. Colorado School of Medicine 303-916-2417 (cell) 303-377-9194 (home) http://compbio.ucdenver.edu/Hunter_lab/Cohen