[Corpora-List] Corpora of theatre play-texts

Frank Fischer frank.fischer at zentr.uni-goettingen.de
Mon Aug 31 12:26:17 CEST 2020


Thanks for pointing to our platform https://dracor.org/ – at the moment we host 11 XML-TEI-encoded corpora of theatre plays (and counting), some are in-house corpora, others are derived from freely available sources (and enhanced with various kinds of annotations) or run by colleagues who asked us to host their corpora:

* German Drama Corpus (501 plays, 18th–20th century) * Russian Drama Corpus (210 plays, 18th-20th century) * Italian Drama Corpus (139 plays, 15th–19th century) – via Biblioteca italiana * Swedish Drama Corpus (68 plays) – via Dramawebben * Calderón Drama Corpus (54 plays) – maintained by University of Tübingen, Institute of Romance Languages and Literatures * Greek Drama Corpus (39 plays) – via Perseus * Shakespeare Drama Corpus (37 plays) – via Folger * Roman Drama Corpus (36 plays) – via Perseus, heavily enhanced by us * Spanish Drama Corpus (25 plays) – via BETTE * Alsatian Drama Corpus (7 plays) – maintained by Pablo Ruiz Fabo at University of Strasbourg * Tatar Drama Corpus (3 plays)

All corpora are encoded in TEI and freely available through GitHub: https://github.com/dracor-org

The platform itself, https://dracor.org/ (still in public beta, but first stable version about to be released), facilitates access to various slices of all corpora to make corpus work easier. All this is done through an API which is documented here: https://dracor.org/documentation/api

For example, if you don't need the XML markup and only want to have speaker text, that's how (using Schiller's first play, "The Robbers", as example): https://dracor.org/api/corpora/ger/play/schiller-die-raeuber/spoken-text

Only stage directions (same play): https://dracor.org/api/corpora/ger/play/schiller-die-raeuber/stage-directions

You can also download text slices (or full texts) from all plays of all corpora in one go via the API.

We also automatically extract co-occurrence networks (previewed on the website, but also downloadable in CSV, GEXF, GraphML formats): https://dracor.org/ger/schiller-die-raeuber

There are many more API functions and we collect new ideas and bug reports via GitHub tickets.

If you want to attach a TEI-encoded corpus to our platform, feel free to contact me. If you wanna know more about the concept behind the platform ("Programmable Corpora"), here's our paper presented at DH2019: https://dev.clariah.nl/files/dh2019/boa/0268.html

Oh, in reply to Olaia's initial post, the EMOTHE project at Universidad de Valencia could be interesting, they have (parallel) editions of plays in EN, ES, FR, IT: https://emothe.uv.es/biblioteca/

Sorry for the lengthy post ;) Frank

Am 30.08.20 um 15:55 schrieb Angus B. Grieve-Smith:
> I wasn't able to get to that site, but I found their GitHub. They have
> plays in Tatar and Alsatian!  Not representative samples, as far as I
> can tell.  Thanks, Anastasia!
>
> https://github.com/dracor-org
>
>
> On 8/28/2020 1:11 PM, Anastasia Bonch-Osmolovskaya wrote:
>> I think Drama Corpora Project may be useful for you
>> https://dracor.org
>>
>> Kind regards
>> Anastasia
>>
>> пт, 28 авг. 2020 г. в 17:35, <evalacroix at free.fr
>> <mailto:evalacroix at free.fr>>:
>>
>>     Thanks for the French corpus, Angus!
>>     DTA (Deutsches Textarchiv) offers several German theatre plays
>>     corpora, with facsimile and modern text transcription (see section
>>     "Drama":
>>     http://www.deutschestextarchiv.de/list/browse?genre=Belletristik)
>>
>>     Download
>>     XML (TEI P5) · HTML · Text
>>     TCF (text annotation layer)
>>     TCF (tokenisiert, serialisiert, lemmatisiert, normalisiert)
>>     XML (TEI P5 inkl. att.linguistic)
>>
>>     Kind regards
>>     Eva
>>
>>     ----- Mail original -----
>>     De: "Angus B. Grieve-Smith" <grvsmth at panix.com
>>     <mailto:grvsmth at panix.com>>
>>     À: corpora at uib.no <mailto:corpora at uib.no>
>>     Envoyé: Vendredi 28 Août 2020 15:56:56
>>     Objet: Re: [Corpora-List] Corpora of theatre play-texts
>>
>>     I have created a monolingual corpus of French plays from the early
>>     nineteenth century:
>>
>>     https://github.com/grvsmth/theatredeparis
>>
>>
>>     On Tue, July 28, 2020 3:29 pm, OLAIA ANDALUZ wrote:
>>     > Hi,
>>     >
>>     > I was wondering if you know corpora that consist of theatre
>>     > play-texts. I am especially looking for parallel corpora
>>     > (English-Spanish), but monolingual corpora and other language
>>     > combinations are also interesting.
>>     >
>>     > I have found these theatre corpora so far:
>>     > Enhanced Shakespearean Corpus[1]
>>     > Drama Corpora Project[2]
>>     > Biblioteca Electrónica Textual del Teatro en Español de 1868-1936
>>     > (BETTE)[3]
>>     >
>>     > Thank you very much for your help!
>>     >
>>     > Best wishes,
>>     > Olaia
>>     >
>>     >
>>     > Vínculos:
>>     > ---------
>>     > [1] http://wp.lancs.ac.uk/shakespearelang/project-resources/data/
>>     > [2] https://dracor.org/
>>     > [3] https://github.com/GHEDI/BETTE
>>     > _______________________________________________
>>     > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>     > Corpora mailing list
>>     > Corpora at uib.no <mailto:Corpora at uib.no>
>>     > https://mailman.uib.no/listinfo/corpora
>>     >
>>
>>
>>     --                             -Angus B. Grieve-Smith
>>     grvsmth at panix.com <mailto:grvsmth at panix.com>
>>
>>
>>
>>     _______________________________________________
>>     UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>     Corpora mailing list
>>     Corpora at uib.no <mailto:Corpora at uib.no>
>>     https://mailman.uib.no/listinfo/corpora
>>
>>     --
>>
>>     Eva Schaeffer-Lacroix
>>     http://didaktik.hautetfort.com
>>     Tél. : 06 64 68 21 92



More information about the Corpora mailing list