[Corpora-List] What do you call...

Hugh Paterson III sil.linguist at gmail.com
Thu Apr 16 15:15:40 CEST 2020


In summary of my email query a a few weeks back, I got two replies. Both thoughtful and divergent in their views.

One suggested that "wordlists" are not "texts" and a collection of such "lists" does not comprise a corpus. The other view suggested that "wordlists" could comprise a highly specialized text and that corpora are comprised of "texts". Both replies mentioned the term "sub-corpora" as a term used in discourse about parts of a large corpus. My presumption is that a sub-corpus would include several "texts" but may meet some sort of logical grouping criteria which is smaller than the "whole unit" which is being used for some research purpose. The replies indicated that there might not be a hard and fast definition for any of the terms.

Thank you for those of you who responded!

all the best, - Hugh

On Tue, Mar 31, 2020 at 4:35 PM Hugh Paterson III <sil.linguist at gmail.com> wrote:

> Greetings,
> I'm looking for an opinion on terms used related to typologies of corpora.
> Some bodies of strings in a "text" are without annotated structure. (They
> may have language informed structure (such as sentence or clause patterns),
> but they are in essence a glob of text). Is this a corpus? or must a corpus
> also have some annotated informatic structure? — such as a corpus of
> newspaper articles where each article is annotated for its beginning and
> end. Some researchers have used the terminology 'a corpus of texts',
> indicating that the component parts of a corpus is some independent body of
> words which is known as "a text".
> If I have 15 bi-lingual lists (a highly structured format of 'text') which
> are in the format of Language A - B; where language A is the same across
> the lists, but Language B is different in each list, and I were to be able
> to cite each list independently, or the compilation all together, how would
> I terminologically refer to the part-whole relationship? can a list be a
> 'text of a corpus'?
> Is each list, a corpus or is the whole collection a corpus? or can the
> term corpus be expected to apply to both part and whole?
> Any citable examples of corpora which contain component parts comprised of
> lists — especially bilingual wordlists, would be appreciated.
> If you want to reply off list, in 5-6 days I'll post an anonymized summary.
> Extra: In general within the corpora using sciences, how much parsing and
> annotation is required before the annotated corpus constitutes using the
> term 'database'?
> all the best,
> - Hugh Paterson III
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3293 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200416/c3eb3507/attachment.txt>

More information about the Corpora mailing list