[Corpora-List] What do you call...

Hugh Paterson III sil.linguist at gmail.com
Tue Mar 31 16:35:18 CEST 2020


I'm looking for an opinion on terms used related to typologies of corpora.

Some bodies of strings in a "text" are without annotated structure. (They may have language informed structure (such as sentence or clause patterns), but they are in essence a glob of text). Is this a corpus? or must a corpus also have some annotated informatic structure? — such as a corpus of newspaper articles where each article is annotated for its beginning and end. Some researchers have used the terminology 'a corpus of texts', indicating that the component parts of a corpus is some independent body of words which is known as "a text".

If I have 15 bi-lingual lists (a highly structured format of 'text') which are in the format of Language A - B; where language A is the same across the lists, but Language B is different in each list, and I were to be able to cite each list independently, or the compilation all together, how would I terminologically refer to the part-whole relationship? can a list be a 'text of a corpus'?

Is each list, a corpus or is the whole collection a corpus? or can the term corpus be expected to apply to both part and whole?

Any citable examples of corpora which contain component parts comprised of lists — especially bilingual wordlists, would be appreciated.

If you want to reply off list, in 5-6 days I'll post an anonymized summary.

Extra: In general within the corpora using sciences, how much parsing and annotation is required before the annotated corpus constitutes using the term 'database'?

all the best, - Hugh Paterson III -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1881 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200331/92504493/attachment.txt>

More information about the Corpora mailing list