[Corpora-List] Identifying relevant key-n-grams (in analogy to keywords)

Rodrigo Esteves de Lima Lopes rll307 at unicamp.br
Mon Nov 22 14:45:01 CET 2021


Dear Robert,

You might also have a look at it:

Bondi, Marina & Mike Scott (eds.). 2010. Keyness in Texts. Amsterdam/Philadelphia: John Benjamins Publishing Company. 251 pp. ISBN 978-90272-8766-3. <https://journals.openedition.org/asp/4932> All the best, Rodrigo

On Fri, 19 Nov 2021 at 16:13, Robert Fuchs <robert.fuchs.dd at googlemail.com> wrote:


> *Dear all,*
>
>
>
>
>
> * We are comparing a reference corpus and a target corpus in order to
> identify keywords and key phrases on a particular topic that is prominent
> in the target purpose but not in the reference corpus. We use log ratio and
> statistical significance in order to identify candidates for keywords,
> i.e. 1-grams, and then go through the rest manually in order to identify
> those that are relevant to the topic at hand (e.g. unemployment and labour
> relations). We remove items that are not relevant, for example if there was
> a random event like a particular sports tournament during the period of the
> target corpus. In addition, we are looking at n-grams with n greater 1 and
> and we're not sure how to decide which n-grams are relevant. For example,
> “unemployment causes poverty” is certainly relevant. On the other hand,
> “unemployment is” or “the unemployed are” or “unemployment causes” are not
> relevant. I would be interested in hearing about any established practices
> about how to distinguish relevant from non-relevant n-grams, or more
> generally any thoughts on how this can be done in a principled way other
> than making ad hoc decisions. A solution we have considered so far is to
> exclude n-grams that only consist of function words in addition to a single
> content word that we already identified as a relevant keyword/1-gram. Other
> than this simple solution, we were wondering if there are more advanced
> approaches to this problem. Thanks and best Robert *
>
> --
> Prof. Dr. Robert Fuchs (JP) | Department of English Language and
> Literature/Institut für Anglistik und Amerikanistik | University of Hamburg
> | Überseering 35, 22297 Hamburg, Germany | Room 07076 |
> https://uni-hamburg.academia.edu/RobertFuchs |
> https://sites.google.com/site/rflinguistics/
>
>
> Mailing list on varieties of English/World Englishes/ENL-ESL-EFL.
> Subscribe here: https://groups.google.com/forum/#!forum/var-eng/join
> Are you a non-native speaker of English? Please help us by taking this
> short survey on when and how you use the English language:
> https://lamapoll.de/englishusageofnonnativespeakers-1/
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8929 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20211122/bff9e55b/attachment.txt>



More information about the Corpora mailing list