[Corpora-List] Urdu corpus

Eric Atwell E.S.Atwell at leeds.ac.uk
Fri Aug 28 19:23:35 CEST 2020

I am not an expert on SketchEngine;, but I know you can get a free 1-month trial licence; and it is free to researchers, teachers and students from academic institutions in the EU.

For more info on their pricing, see https://www.sketchengine.eu/price-list/

eric atwell, Leeds University (non-EU after BREXIT ...)

________________________________ From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Fatima Tul Zuhra <fzuhra at cs.qau.edu.pk> Sent: 28 August 2020 16:56 To: Dan Zeman <zeman at ufal.mff.cuni.cz> Cc: corpora at uib.no <corpora at uib.no> Subject: Re: [Corpora-List] Urdu corpus

Thanks to all the responders.

Eric: What I know of SketchEngine is that it is not free. Is that right?

Daniel: The UD Urdu dataset has some 5000+ sentences in dependency tree format. What I need is a bit huge corpus of plain Urdu text. There is one, but it is a bit more expensive. I wonder if there is some freely available Urdu plain text corpus?

Best regards.

On Fri, Aug 28, 2020 at 8:42 PM Dan Zeman <zeman at ufal.mff.cuni.cz<mailto:zeman at ufal.mff.cuni.cz>> wrote: Plus the Urdu Universal Dependencies treebank: https://universaldependencies.org/treebanks/ur_udtb/index.html

Best, Dan

Dne 28.08.2020 v 17:27 Eric Atwell napsal(a): Fatima,

you can search the 50-million-word Urdu Web Corpus on the SketchEngine website https://www.sketchengine.eu/urwac-urdu-corpus/

You can also use SketchEngine to collect your own specialsed Urdu text corpus.

You can download Urdu corpora from WWW, eg google "Urdu corpus download" or search "Urdu" in www.kaggle.com<http://www.kaggle.com> datasets

for example:

The Holy Quran


Urdu Language Speech Emotional Corpus


or https://www.kaggle.com/bitlord/urdu-language-speech-dataset

Urdu Monolingual Corpus


Urdu-Nepali-English Parallel Corpus


English-Urdu Parallel Corpus


Urdu-Nepali Parallel Corpus


Urdu / Hindi News Headlines


Urdu Movie Reviews


iNLTK Urdu News


Urdu Wikipedia


Language Identification dataset


urdu sentiment twitter dataset


Urdu Speech Dataset (audio files)


Eric Atwell, Professor of Artificial Intelligence for Language

PhD tutor; online MSc AI programme leader

School of Computing, Uni of LEEDS, LS2 9JT, UK

http://www.comp.leeds.ac.uk/eric https://www.edubots.eu

________________________________ From: corpora-bounces at uib.no<mailto:corpora-bounces at uib.no> <corpora-bounces at uib.no><mailto:corpora-bounces at uib.no> on behalf of Fatima Tul Zuhra <fzuhra at cs.qau.edu.pk><mailto:fzuhra at cs.qau.edu.pk> Sent: 28 August 2020 15:37 To: corpora at uib.no<mailto:corpora at uib.no> <corpora at uib.no><mailto:corpora at uib.no> Subject: [Corpora-List] Urdu corpus


I want to know if there is exists some Urdu corpus that is freely downloadable?

Thanks in anticipation.


-- Fatima Tuz Zuhra Ph.D. Scholar, Quaid i Azam University Islamabad, Pakistan.

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> https://mailman.uib.no/listinfo/corpora

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> https://mailman.uib.no/listinfo/corpora

-- Fatima Tuz Zuhra -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 13457 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200828/a8a3c9fb/attachment.txt>

More information about the Corpora mailing list