[Corpora-List] Monolingual Tagalog corpora

Miloš Jakubíček milos.jakubicek at sketchengine.co.uk
Tue Jun 4 20:49:44 CEST 2019


Dear Ella,

we are now working on a lexicographic project for Tagalog, as part of which we have built 150M web corpus last year and a 300M web corpus this year. The latter one is still not public but if you would like some early access let me know. At the moment we are mainly working on the lemmatization and PoS tagging of the corpus (once that is done, the corpus will be part of Sketch Engine).

All the best, Milos Jakubicek

CEO, Lexical Computing Brno, CZ | Brighton UK http://www.lexicalcomputing.com http://www.sketchengine.co.uk

On Tue, 4 Jun 2019 at 17:49, Ella Rabinovich <ellarabi at gmail.com> wrote:


> Hello everyone,
>
> We are looking for monolingual Tagalog dataset(s), other than Wikipedia,
> for a project investigating patterns of code-switching between English and
> Tagalog.
>
> Any help would be greatly appreciated!
>
> Ella Rabinovich
> University of Toronto, Canada
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2481 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190604/25fa0943/attachment.txt>



More information about the Corpora mailing list