[Corpora-List] [External] Re: Peer review

Mcenery, Tony a.mcenery at lancaster.ac.uk
Tue Dec 29 13:03:01 CET 2020

Hi All,

a very interesting discussion. A few additional notes, largely from a UK perspective:

1.) The Linguistics Society of America has passed a resolution urging that research outputs such as corpora should be viewed as scholarly outputs in their own right: https://www.linguisticsociety.org/resource/resolution-recognizing-scholarly-merit-language-documentation 2.) In the UK, our national research assessment exercise allows for corpora to be submitted to it. Any corpora submitted to it are assessed on a par with other research outputs, e.g. journal articles, books etc. 3.) The UK research councils certainly view corpora as research outputs and require that those produced with their support are duly catalogued and reported. 4.) In terms of peer review, when seeking support for corpus construction from funders the plans/need for a corpus are assessed through peer review, i.e. you make a grant application and that gets assessed. Likewise at the end of a grant the corpus itself may be subject to review, though end of grant reviews fluctuate over time in the UK between being more and less formal. 5.) As Mark notes, it is often the case that people write an account of a corpus and the decisions made in building it - that output (e.g. a journal article) is, of course, peer reviewed. 6.) I think that archives such as ELRA and LDC do check the corpora that they distribute, though those checks tend to be formal rather than conceptual in my experience, e.g. if the corpus uses XML, they check that the XML parses.

Of course, many corpora are also, in effect, given a post publication peer review, i.e. people use, edit and critique that data.

On a separate, though related, note the issue with promotions committees that Mark notes also applies, at times, to researchers who produce software packages. I suspect the dynamic there is very similar to that discussed already, with certain disciplines being more open to recognising software as a research output than others. Points 3-5 above (at least) apply to producers of software as much as they do to producers of corpora.

Hope this helps,

Tony ________________________________ From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Angelo Salatino <aas88ie at gmail.com> Sent: 29 December 2020 09:32 To: Khurshid Ahmad <kahmad at scss.tcd.ie> Cc: corpora at uib.no <corpora at uib.no>; Mark Davies <Mark_Davies at byu.edu> Subject: [External] Re: [Corpora-List] Peer review

This email originated outside the University. Check before clicking links or attachments.

Dear Khurshid, Hugh, Mark, all,

I am very much enjoying this conversation. I work primarily between the field of Science of Science and Semantic Web. Mostly, bringing semantic web technologies to the advantage of Science of Science advances. With regards to this conversation, at the International Semantic Web Conference, which is the premiere conference of the field, we have the "Resource track" where we can submit resources – such as datasets, ontologies, vocabularies, software and others. And the process is identical to any other track: it goes through peer review. Indeed, one of my most cited papers is the "The computer science ontology: a large-scale taxonomy of research areas" which is the largest ontology of research topics in the field of Computer Science and is published through the track I mentioned above.

In this regard, the community of Science of Science and Digital Libraries is very active. I am personally involved in a Special Issue on Scientific Knowledge Graphs and Research Impact Assessment (at QSS OA journal, MIT Press), where we are seeking high quality and innovative solutions for the production of Scientific Knowledge Graphs (SKGs). Call for paper: https://www.mitpressjournals.org/pb-assets/pdfs/Calls%20for%20Papers/QSS_CFP_2020.pdf<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mitpressjournals.org%2Fpb-assets%2Fpdfs%2FCalls%2520for%2520Papers%2FQSS_CFP_2020.pdf&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717892447%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rMUxzlNwEk3AE4sekSZAEd0GNmbv3QE%2FCwmGk3rwjGQ%3D&reserved=0> Similarly to this, I am also involved in the 1st International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment, co-located with The Web Conference 2021. Call for papers: https://sci-k.github.io/#call-for-papers<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsci-k.github.io%2F%23call-for-papers&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717902444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=h6sjidpDDj%2BFlzqHHZ38NADd3sI3Pb0gTarGkgDUu9k%3D&reserved=0>
>From the call for papers of the latter, you can read that we are also looking for data papers and software papers, and for the sake of simplicity I report here the small excerpt extracted from the CfP.

"In accordance with Open Science principles, research papers may also be in the form of data papers and software papers (short or long papers). The former present the motivation and methodology behind the creation of data sets that are of value to the community; e.g., annotated corpora, benchmark collections, training sets. The latter present software functionality, its value for the community, and its application to a non-specialist reader. To enable reproducibility and peer-review, authors will be requested to share the DOIs of the data sets and the software products described in the articles and thoroughly describe their construction and reuse."

Hoping you would find this information very useful, I wish you all a happy and healthy new year. Best regards Angelo

Angelo Salatino

Research Associate and Associate Lecturer

Knowledge Media Institute,

The Open University

Twttr: @angelosalatino<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fangelosalatino&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717902444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1M0g8XLy6JcneRduB2ueKmbPA14h26GnApJOrhHHoE0%3D&reserved=0>

Web: https://salatino.org<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsalatino.org%2F&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717912435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Z4XSWbhXQu8Tx32XqGRRk1b8SZ98Zm4YuGa3dS9gbOY%3D&reserved=0>

On Tue, 29 Dec 2020 at 03:45, Khurshid Ahmad <kahmad at scss.tcd.ie<mailto:kahmad at scss.tcd.ie>> wrote: Dear Hugh The 'peer review' is very important: one measure of the impact of your scholarship in this digital era is the number of downloads your corpus/corpora has/have. As Mark has rightly suggested, computer science folks are more receptive to this idea. You might enter the downloads as a measure of esteem your colleagues have. The number of hits is a key measure of ranking employed by search engine, and in Google ranking the 'fancy hits'- the number of people looking upto your website is critical for higher ranking. In another domain, mass communications, the downloads may indicate your reputation.

I am not much in favour of the so-called are journal publications, in some branches of engineering and physics, the publication of your research in a 'letter' or rapid communications journals is regarded more highly, and in yet other disciplines a monograph is essential.

Whatever happens please keep up the good work and promote data driven research.

--- Best wishes

Khurshid Ahmad. PhD, FBCS, FTCD, CITP Professor of Computer Science School of Computer Science and Statistics Trinity College Dublin 2 IRELAND

Phone: 00353 1 896 8429 (Labs: 00 353 1 8968435) Fax 353 1 677 2204 Webpage: www.cs.tcd.ie/khurshid.ahmad<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cs.tcd.ie%2Fkhurshid.ahmad&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717912435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=95eB5V0kUJ%2FlzvtRXmHN2Ap37BKNwCtkLPH7j%2BqU2vU%3D&reserved=0>

On 2020-12-28 18:59, Mark Davies wrote:
>>> Are their alternative models (and or vocabulary) being used for
> discussing how the compilation of a corpus is part of one's scientific
> output?
> I've created a number of corpora [1] that have been widely used by
> researchers. But I've worked in a College of Humanities where rank and
> status committees are typically dominated by people in literary and
> cultural studies, where the only thing they really understand is the
> all-important journal article. (Even peer-reviewed conference papers
> are usually suspect in their eyes.) And they would never understand,
> for example, data from something like Google Analytics, which provides
> concrete data on the number of people actually using the corpora [2],
> or the number of citations in Google Scholar [3].
> So for each of the corpora that I've created, I've tried to make sure
> that I do have journal articles [4] that explain the creation and use
> of the corpora. Of course if you're in a college that includes
> computer science, for example, they will probably be more open-minded
> to the intrinsic value of creating corpora / large datasets that are
> widely used by other researchers.
> Mark Davies
> ============================================
> Mark Davies
> Professor Emeritus of Linguistics
> https://www.mark-davies.info<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mark-davies.info%2F&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717922430%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7VyHyh5rzVWFiGufUNV8uh8lpSEt04pZOMmjWppv1FQ%3D&reserved=0>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
> ________________________________________
> From: corpora-bounces at uib.no<mailto:corpora-bounces at uib.no> <corpora-bounces at uib.no<mailto:corpora-bounces at uib.no>> on behalf of
> Hugh Paterson III <sil.linguist at gmail.com<mailto:sil.linguist at gmail.com>>
> Sent: Friday, December 25, 2020 5:52 PM
> To: corpora at uib.no<mailto:corpora at uib.no>
> Subject: [Corpora-List] Peer review
> Greetings,
> Peer-reviewed publication is an important part of academic advancement
> in many job situations. I am not seeing any discussion in the
> literature on how corpora are being "peer-reviewed" (I'm using google
> scholar). Are their alternative models (and or vocabulary) being used
> for discussing how the compilation of a corpus is part of one's
> scientific output? any recent papers on this issue? I see some recent
> literature discussion on data citation, and software citation, but
> these don't address the peer-review aspect, and don't specifically
> address corpora.
> all the best,
> - hugh paterson III
> Links:
> ------
> [1] https://www.english-corpora.org/<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.english-corpora.org%2F&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717922430%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SnmJ6MrOTO4cFvbjSXwisKn7L1TxrYWxXaBV1uyxOC4%3D&reserved=0>
> [2] https://www.english-corpora.org/users.asp<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.english-corpora.org%2Fusers.asp&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717932423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EAC%2FNtG%2F7mSDgG6HV2AW9H9U1mRcTr3VgVgj1xFnOCQ%3D&reserved=0>
> [3]
> https://scholar.google.com/citations?user=8-LRgUIAAAAJ&amp;hl=en&amp;oi=ao<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fuser%3D8-LRgUIAAAAJ%26hl%3Den%26oi%3Dao&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717932423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8k19Ah0iTD3w5RR9y7GYgC6sjtFNs4XZReuatsSAljQ%3D&reserved=0>
> [4] https://www.mark-davies.info/vita.pdf<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mark-davies.info%2Fvita.pdf&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717942419%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=juJ4BbdKHPpei04T7P6N5A2DjBWbUL3qNDBi%2FvdrrWg%3D&reserved=0>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.uib.no%2Foptions%2Fcorpora&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717942419%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FN929gPYc63AUC7ujBPxsJK6LxZ4gYHD3dz6rtkyRCs%3D&reserved=0>
> Corpora mailing list
> Corpora at uib.no<mailto:Corpora at uib.no>
> https://mailman.uib.no/listinfo/corpora<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmailman.uib.no%2Flistinfo%2Fcorpora&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717952411%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Pm77F0Rp0i1BECLW8Spsqp06kCuYvZsE1hJTtdeSbfc%3D&reserved=0>

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.uib.no%2Foptions%2Fcorpora&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717952411%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UDVYKWcigFq%2FMMGh%2BaByMz6BZfv6JpqFMB8h4F0%2B1VQ%3D&reserved=0> Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> https://mailman.uib.no/listinfo/corpora<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmailman.uib.no%2Flistinfo%2Fcorpora&data=04%7C01%7Ca.mcenery%40lancaster.ac.uk%7C80121a3208564ffc2d8408d8abdddf9e%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C637448316717952411%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Pm77F0Rp0i1BECLW8Spsqp06kCuYvZsE1hJTtdeSbfc%3D&reserved=0> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 26706 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20201229/cf14d077/attachment.txt>

More information about the Corpora mailing list