[Corpora-List] PersianQuAD: The Native Question Answering Dataset for the Persian Language

Arefeh Kazemi arefeh_kazemi at yahoo.com
Fri Mar 11 05:37:20 CET 2022


Dear colleagues,

We are pleased to announce the release of PersianQuAD: The Native Question Answering Dataset for the Persian Language.  PersianQuAD is created at the University of Isfahan, Iran. It contains about 20,000 questions and answers made by native annotators on a set of Persian Wikipedia articles. The answer to each question is a segment of the corresponding article text. We trained three versions of a deep learning-based QA system trained with PersianQuAD. The best system achieves an F1 score of 82.97% . This shows that PersianQuAD performs well for training deep-learning-based QA systems. Human performance on PersianQuAD is significantly better (96.49%), demonstrating that PersianQuAD is challenging enough and there is still plenty of room for future improvement.  PersianQuAD and all QA models trained on it are freely available and can be downloaded from: https://github.com/BigData-IsfahanUni/PersianQuAD

Please cite the following paper if you use the dataset in your research:  A. Kazemi, J. Mozafari and M. A. Nematbakhsh, "PersianQuAD: The Native Question Answering Dataset for the Persian Language," in IEEE Access, doi: 10.1109/ACCESS.2022.3157289. https://doi.org/10.1109/ACCESS.2022.3157289 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4290 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220311/a253d2db/attachment.txt>



More information about the Corpora mailing list