[Corpora-List] Urdu corpus

Dan Zeman zeman at ufal.mff.cuni.cz
Fri Aug 28 17:39:23 CEST 2020


Plus the Urdu Universal Dependencies treebank: https://universaldependencies.org/treebanks/ur_udtb/index.html

Best, Dan

Dne 28.08.2020 v 17:27 Eric Atwell napsal(a):
> Fatima,
>
> you can search the 50-million-word Urdu Web Corpus on the SketchEngine
> website
> https://www.sketchengine.eu/urwac-urdu-corpus/
>  You can also use SketchEngine to collect your own specialsed Urdu
> text corpus.
>
> You can download Urdu corpora from WWW, eg google "Urdu corpus download"
> or search "Urdu" in www.kaggle.com <http://www.kaggle.com> datasets
>
> for example:
>
> The Holy Quran
> https://www.kaggle.com/zusmani/the-holy-quran
>
> Urdu Language Speech Emotional Corpus
> https://github.com/siddiquelatif/URDU-Dataset
>  or https://www.kaggle.com/bitlord/urdu-language-speech-dataset
>
> Urdu Monolingual Corpus
> https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-65A9-5
>
> Urdu-Nepali-English Parallel Corpus
> http://www.cle.org.pk/software/ling_resources/UrduNepaliEnglishParallelCorpus.htm
>
> English-Urdu Parallel Corpus
> http://ufal.ms.mff.cuni.cz/umc/005-en-ur/
>
> Urdu-Nepali Parallel Corpus
> https://www.kaggle.com/rtatman/urdunepali-parallel-corpus
>
> Urdu / Hindi News Headlines
> https://www.kaggle.com/adnanzaidi/urdu-news-headlines
>
> Urdu Movie Reviews
> https://www.kaggle.com/akkefa/imdb-dataset-of-50k-movie-translated-urdu-reviews
>
> iNLTK Urdu News
> https://www.kaggle.com/disisbig/urdu-news-dataset
>
> Urdu Wikipedia
> https://www.kaggle.com/disisbig/urdu-wikipedia-articles
>
> Language Identification dataset
> https://www.kaggle.com/zarajamshaid/language-identification-datasst
>
> urdu sentiment twitter dataset
> https://www.kaggle.com/raheelabibi/urdu-sentiment-data
>
> Urdu Speech Dataset (audio files)
> https://www.kaggle.com/hazrat/urdu-speech-dataset
>
>
> Eric Atwell, Professor of Artificial Intelligence for Language
>  PhD tutor; online MSc AI programme leader
>  School of Computing, Uni of LEEDS, LS2 9JT, UK
>   http://www.comp.leeds.ac.uk/eric https://www.edubots.eu
>
>
>
> ------------------------------------------------------------------------
> *From:* corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of
> Fatima Tul Zuhra <fzuhra at cs.qau.edu.pk>
> *Sent:* 28 August 2020 15:37
> *To:* corpora at uib.no <corpora at uib.no>
> *Subject:* [Corpora-List] Urdu corpus
> Hi,
>
> I want to know if there is exists some Urdu corpus that is freely
> downloadable?
>
> Thanks in anticipation.
>
> Regards.
>
> --
> Fatima Tuz Zuhra
> Ph.D. Scholar,
> Quaid i Azam University Islamabad, Pakistan.
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 11127 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200828/e8ad1103/attachment.txt>



More information about the Corpora mailing list