[Corpora-List] Urdu corpus

Eric Atwell E.S.Atwell at leeds.ac.uk
Fri Aug 28 17:27:14 CEST 2020


Fatima,

you can search the 50-million-word Urdu Web Corpus on the SketchEngine website https://www.sketchengine.eu/urwac-urdu-corpus/

You can also use SketchEngine to collect your own specialsed Urdu text corpus.

You can download Urdu corpora from WWW, eg google "Urdu corpus download" or search "Urdu" in www.kaggle.com<http://www.kaggle.com> datasets

for example:

The Holy Quran

https://www.kaggle.com/zusmani/the-holy-quran

Urdu Language Speech Emotional Corpus

https://github.com/siddiquelatif/URDU-Dataset

or https://www.kaggle.com/bitlord/urdu-language-speech-dataset

Urdu Monolingual Corpus

https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-65A9-5

Urdu-Nepali-English Parallel Corpus

http://www.cle.org.pk/software/ling_resources/UrduNepaliEnglishParallelCorpus.htm

English-Urdu Parallel Corpus

http://ufal.ms.mff.cuni.cz/umc/005-en-ur/

Urdu-Nepali Parallel Corpus

https://www.kaggle.com/rtatman/urdunepali-parallel-corpus

Urdu / Hindi News Headlines

https://www.kaggle.com/adnanzaidi/urdu-news-headlines

Urdu Movie Reviews

https://www.kaggle.com/akkefa/imdb-dataset-of-50k-movie-translated-urdu-reviews

iNLTK Urdu News

https://www.kaggle.com/disisbig/urdu-news-dataset

Urdu Wikipedia

https://www.kaggle.com/disisbig/urdu-wikipedia-articles

Language Identification dataset

https://www.kaggle.com/zarajamshaid/language-identification-datasst

urdu sentiment twitter dataset

https://www.kaggle.com/raheelabibi/urdu-sentiment-data

Urdu Speech Dataset (audio files)

https://www.kaggle.com/hazrat/urdu-speech-dataset

Eric Atwell, Professor of Artificial Intelligence for Language

PhD tutor; online MSc AI programme leader

School of Computing, Uni of LEEDS, LS2 9JT, UK

http://www.comp.leeds.ac.uk/eric https://www.edubots.eu

________________________________ From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Fatima Tul Zuhra <fzuhra at cs.qau.edu.pk> Sent: 28 August 2020 15:37 To: corpora at uib.no <corpora at uib.no> Subject: [Corpora-List] Urdu corpus

Hi,

I want to know if there is exists some Urdu corpus that is freely downloadable?

Thanks in anticipation.

Regards.

-- Fatima Tuz Zuhra Ph.D. Scholar, Quaid i Azam University Islamabad, Pakistan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8429 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200828/38d03d4e/attachment.txt>



More information about the Corpora mailing list