[Corpora-List] How to evaluate a synthetic text corpus?

Emiel van Miltenburg C.W.J.vanMiltenburg at tilburguniversity.edu
Tue Apr 19 14:42:08 CEST 2022


Hi Jayr,

I second what Dave said, and I’ll add some shameless self-promotion from my side as well. You may find the following papers helpful:

van der Lee, Chris, Albert Gatt, Emiel van Miltenburg, and Emiel Krahmer. "Human evaluation of automatically generated text: Current trends and best practice guidelines." Computer Speech & Language 67 (2021): 101151.

Van Miltenburg, Emiel, Miruna Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood et al. "Underreporting of errors in NLG output, and what to do about it." In Proceedings of the 14th International Conference on Natural Language Generation, pp. 140-153. 2021.

Best wishes, Emiel

On 19 Apr 2022, at 14:00, Jayr Alencar Pereira <jap2 at cin.ufpe.br<mailto:jap2 at cin.ufpe.br>> wrote:

Thank you so much, David Howcroft.

Your contributions give me a direction on how to evaluate my proposal. I think a human evaluation is the most appropriate way to evaluate my proposal. I already have sentences constructed by humans, but I used them as a basis for automatically producing more, so I can insert some bias on the evaluation if I use them. Based on your paper, I think the next challenging step is to define the evaluation criteria.

Thank you.

Em qua., 13 de abr. de 2022 ās 10:08, David Howcroft <dave.howcroft at gmail.com<mailto:dave.howcroft at gmail.com>> escreveu: Hi Jayr,

Unfortunately there are not good automated metrics for evaluating natural language generation (NLG) in general, though there may be some tools you can use to assess certain aspects of texts in certain contexts.

The 'gold standard' is to do some kind of human evaluation where you ask your participants to assess the quality of the text along whatever dimensions are most important for your task, usually including at least some assessment of 'fluency' (e.g. grammaticality, comprehensibility, clarity, naturalness, etc) and some assessment of adequacy (e.g. semantic/content accuracy w.r.t. the input, truthfulness, coherence, etc).

Shameless self-promotion: I led an effort a couple of years ago to look at how NLG evaluation has been done by the research community over the last 20 years (https://aclanthology.org/2020.inlg-1.23/), which might prove helpful.

Novikova et al. 2017 looked at automated metrics in a systematic way: http://aclweb.org/anthology/D17-1238

You can find pointers to work on human evaluations through the HumEval workshops (https://humeval.github.io/) and ReproGen shared tasks (https://reprogen.github.io/).

Happy to talk more offline if you would like :)

Peace, Dave ---- David M. Howcroft https://www.davehowcroft.com<https://www.davehowcroft.com/>

On Wed, Apr 13, 2022 at 1:06 PM Jayr Alencar Pereira <jap2 at cin.ufpe.br<mailto:jap2 at cin.ufpe.br>> wrote: Hi, everybody.

I collected a list of 700 example sentences from domain specialists. And used this list as a basis for generating new 9 k sentences using a generative language model. Now, I am looking for methods for evaluating the quality of my generated corpus.

I have trained an n-gram language model using the generated corpus and measured the model perplexity in the specialists' sentences. I have good results on it, but I think I can evaluate it using other methods.

If you have any related research, please let me know.

Thank you in advance.

-- ** Pax et bonum

Jayr Alencar Pereira. PhD student Center of Informatics, Federal University of Pernambuco, Recife - Brazil Homepage: jayr.clubedosgeeks.com.br<http://jayr.clubedosgeeks.com.br/> GitHub: @jayralencar<https://github.com/jayralencar> CV Lattes<http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8561724U9>

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> https://mailman.uib.no/listinfo/corpora

-- ** Pax et bonum

Jayr Alencar Pereira. PhD student Center of Informatics, Federal University of Pernambuco, Recife - Brazil Homepage: jayr.clubedosgeeks.com.br<http://jayr.clubedosgeeks.com.br/> GitHub: @jayralencar<https://github.com/jayralencar> CV Lattes<http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8561724U9>

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> https://mailman.uib.no/listinfo/corpora

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 19582 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220419/bab7412c/attachment.txt>



More information about the Corpora mailing list