[Corpora-List] Daily keywords (re Ukraine and other current events) in the 14.7 billion word NOW Corpus

Mark Davies mark.davies at english-corpora.org
Wed Mar 16 21:20:09 CET 2022


Many of us have been focused on the tragic events in Ukraine, and we believe it is important that people from all sides of the conflict have access to recent and relevant corpus-based data regarding these events.

To meet this need, we have recently developed a new tool at English-Corpora.org, which allows researchers, teachers, and students to track changes in news reporting about the events in Ukraine -- or any other event in the news -- almost in real time.

This tool is available at the NOW Corpus ( https://www.english-corpora.org/now). The NOW Corpus currently contains about 14.7 billion words of text; it grows by about 200-220 million words per month, or about 7-8 million words per day; and it is based on thousands of newspapers and magazines in 20 English-speaking countries.

Using NOW, it is possible (as of yesterday) to quickly and easily find keywords for a particular day, to see what topics are in the news. For example, for March 15 (yesterday) these keywords include {no-fly, anti-war, bombardment, oligarch, nuclear-armed, besieged, shelters, invaders, homicide, fleeing, refugee, refugees, corridors, humanitarian}.

There are many other options, such as limiting by part of speech, comparing the keywords from different countries, showing or hiding proper nouns, and limiting and sorting by frequency, the number of texts, the number of countries, or the frequency compared to earlier dates. From the basic keyword lists, users can click for re-sortable concordance lines, to see how the keywords are used in context.

For a brief overview, please see https://www.english-corpora.org/now/files/NOW-keywords-by-date.pdf

============================================ Mark Davies english-corpora.org mark-davies.org ============================================ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2249 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220316/c1783a2d/attachment.txt>



More information about the Corpora mailing list