[Corpora-List] Urdu corpus

Fatima Tul Zuhra fzuhra at cs.qau.edu.pk
Fri Aug 28 17:56:25 CEST 2020


Thanks to all the responders.

Eric: What I know of SketchEngine is that it is not free. Is that right?

Daniel: The UD Urdu dataset has some 5000+ sentences in dependency tree format. What I need is a bit huge corpus of plain Urdu text. There is one, but it is a bit more expensive. I wonder if there is some freely available Urdu plain text corpus?

Best regards.

On Fri, Aug 28, 2020 at 8:42 PM Dan Zeman <zeman at ufal.mff.cuni.cz> wrote:


> Plus the Urdu Universal Dependencies treebank:
> https://universaldependencies.org/treebanks/ur_udtb/index.html
>
> Best,
> Dan
>
>
> Dne 28.08.2020 v 17:27 Eric Atwell napsal(a):
>
> Fatima,
>
> you can search the 50-million-word Urdu Web Corpus on the SketchEngine
> website
> https://www.sketchengine.eu/urwac-urdu-corpus/
> You can also use SketchEngine to collect your own specialsed Urdu text
> corpus.
>
> You can download Urdu corpora from WWW, eg google "Urdu corpus download"
> or search "Urdu" in www.kaggle.com datasets
>
> for example:
>
> The Holy Quran
> https://www.kaggle.com/zusmani/the-holy-quran
>
> Urdu Language Speech Emotional Corpus
> https://github.com/siddiquelatif/URDU-Dataset
> or https://www.kaggle.com/bitlord/urdu-language-speech-dataset
>
> Urdu Monolingual Corpus
>
> https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-65A9-5
>
> Urdu-Nepali-English Parallel Corpus
>
> http://www.cle.org.pk/software/ling_resources/UrduNepaliEnglishParallelCorpus.htm
>
> English-Urdu Parallel Corpus
> http://ufal.ms.mff.cuni.cz/umc/005-en-ur/
>
> Urdu-Nepali Parallel Corpus
> https://www.kaggle.com/rtatman/urdunepali-parallel-corpus
>
> Urdu / Hindi News Headlines
> https://www.kaggle.com/adnanzaidi/urdu-news-headlines
>
> Urdu Movie Reviews
>
> https://www.kaggle.com/akkefa/imdb-dataset-of-50k-movie-translated-urdu-reviews
>
> iNLTK Urdu News
> https://www.kaggle.com/disisbig/urdu-news-dataset
>
> Urdu Wikipedia
> https://www.kaggle.com/disisbig/urdu-wikipedia-articles
>
> Language Identification dataset
> https://www.kaggle.com/zarajamshaid/language-identification-datasst
>
> urdu sentiment twitter dataset
> https://www.kaggle.com/raheelabibi/urdu-sentiment-data
>
> Urdu Speech Dataset (audio files)
> https://www.kaggle.com/hazrat/urdu-speech-dataset
>
>
> Eric Atwell, Professor of Artificial Intelligence for Language
> PhD tutor; online MSc AI programme leader
> School of Computing, Uni of LEEDS, LS2 9JT, UK
> http://www.comp.leeds.ac.uk/eric https://www.edubots.eu
>
>
>
> ------------------------------
> *From:* corpora-bounces at uib.no <corpora-bounces at uib.no>
> <corpora-bounces at uib.no> on behalf of Fatima Tul Zuhra
> <fzuhra at cs.qau.edu.pk> <fzuhra at cs.qau.edu.pk>
> *Sent:* 28 August 2020 15:37
> *To:* corpora at uib.no <corpora at uib.no> <corpora at uib.no>
> *Subject:* [Corpora-List] Urdu corpus
>
> Hi,
>
> I want to know if there is exists some Urdu corpus that is freely
> downloadable?
>
> Thanks in anticipation.
>
> Regards.
>
> --
> Fatima Tuz Zuhra
> Ph.D. Scholar,
> Quaid i Azam University Islamabad, Pakistan.
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing listCorpora at uib.nohttps://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>

-- Fatima Tuz Zuhra -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 11982 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200828/c127340f/attachment.txt>



More information about the Corpora mailing list