[Corpora-List] Urdu corpus

Eric Atwell E.S.Atwell at leeds.ac.uk
Fri Aug 28 19:23:35 CEST 2020


I am not an expert on SketchEngine;, but I know you can get a free 1-month trial licence; and it is free to researchers, teachers and students from academic institutions in the EU.

For more info on their pricing, see https://www.sketchengine.eu/price-list/

eric atwell, Leeds University (non-EU after BREXIT ...)

________________________________ From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Fatima Tul Zuhra <fzuhra at cs.qau.edu.pk> Sent: 28 August 2020 16:56 To: Dan Zeman <zeman at ufal.mff.cuni.cz> Cc: corpora at uib.no <corpora at uib.no> Subject: Re: [Corpora-List] Urdu corpus

Thanks to all the responders.

Eric: What I know of SketchEngine is that it is not free. Is that right?

Daniel: The UD Urdu dataset has some 5000+ sentences in dependency tree format. What I need is a bit huge corpus of plain Urdu text. There is one, but it is a bit more expensive. I wonder if there is some freely available Urdu plain text corpus?

Best regards.

On Fri, Aug 28, 2020 at 8:42 PM Dan Zeman <zeman at ufal.mff.cuni.cz<mailto:zeman at ufal.mff.cuni.cz>> wrote: Plus the Urdu Universal Dependencies treebank: https://universaldependencies.org/treebanks/ur_udtb/index.html

Best, Dan

Dne 28.08.2020 v 17:27 Eric Atwell napsal(a): Fatima,

you can search the 50-million-word Urdu Web Corpus on the SketchEngine website https://www.sketchengine.eu/urwac-urdu-corpus/

You can also use SketchEngine to collect your own specialsed Urdu text corpus.

You can download Urdu corpora from WWW, eg google "Urdu corpus download" or search "Urdu" in www.kaggle.com<http://www.kaggle.com> datasets

for example:

The Holy Quran

https://www.kaggle.com/zusmani/the-holy-quran

Urdu Language Speech Emotional Corpus

https://github.com/siddiquelatif/URDU-Dataset

or https://www.kaggle.com/bitlord/urdu-language-speech-dataset

Urdu Monolingual Corpus

https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-65A9-5

Urdu-Nepali-English Parallel Corpus

http://www.cle.org.pk/software/ling_resources/UrduNepaliEnglishParallelCorpus.htm

English-Urdu Parallel Corpus

http://ufal.ms.mff.cuni.cz/umc/005-en-ur/

Urdu-Nepali Parallel Corpus

https://www.kaggle.com/rtatman/urdunepali-parallel-corpus

Urdu / Hindi News Headlines

https://www.kaggle.com/adnanzaidi/urdu-news-headlines

Urdu Movie Reviews

https://www.kaggle.com/akkefa/imdb-dataset-of-50k-movie-translated-urdu-reviews

iNLTK Urdu News

https://www.kaggle.com/disisbig/urdu-news-dataset

Urdu Wikipedia

https://www.kaggle.com/disisbig/urdu-wikipedia-articles

Language Identification dataset

https://www.kaggle.com/zarajamshaid/language-identification-datasst

urdu sentiment twitter dataset

https://www.kaggle.com/raheelabibi/urdu-sentiment-data

Urdu Speech Dataset (audio files)

https://www.kaggle.com/hazrat/urdu-speech-dataset

Eric Atwell, Professor of Artificial Intelligence for Language

PhD tutor; online MSc AI programme leader

School of Computing, Uni of LEEDS, LS2 9JT, UK

http://www.comp.leeds.ac.uk/eric https://www.edubots.eu

________________________________ From: corpora-bounces at uib.no<mailto:corpora-bounces at uib.no> <corpora-bounces at uib.no><mailto:corpora-bounces at uib.no> on behalf of Fatima Tul Zuhra <fzuhra at cs.qau.edu.pk><mailto:fzuhra at cs.qau.edu.pk> Sent: 28 August 2020 15:37 To: corpora at uib.no<mailto:corpora at uib.no> <corpora at uib.no><mailto:corpora at uib.no> Subject: [Corpora-List] Urdu corpus

Hi,

I want to know if there is exists some Urdu corpus that is freely downloadable?

Thanks in anticipation.

Regards.

-- Fatima Tuz Zuhra Ph.D. Scholar, Quaid i Azam University Islamabad, Pakistan.

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> https://mailman.uib.no/listinfo/corpora

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> https://mailman.uib.no/listinfo/corpora

-- Fatima Tuz Zuhra -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 13457 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200828/a8a3c9fb/attachment.txt>



More information about the Corpora mailing list