[Corpora-List] PersianQuAD: The Native Question Answering Dataset for the Persian Language

Arefeh Kazemi arefeh_kazemi at yahoo.com
Thu Mar 10 15:16:23 CET 2022


Dear colleagues,

We are pleased to announce the release of PersianQuAD: The Native Question Answering Dataset for the Persian Language.  PersianQuAD is created at the University of Isfahan, Iran. It contains about 20,000 questions and answers made by native annotators on a set of Persian Wikipedia articles. The answer to each question is a segment of the corresponding article text. We trained three versions of a deep learning-based QA system trained with PersianQuAD. The best system achieves an F1 score of 82.97% . This shows that PersianQuAD performs well for training deep-learning-based QA systems. Human performance on PersianQuAD is significantly better (96.49%), demonstrating that PersianQuAD is challenging enough and there is still plenty of room for future improvement.  PersianQuAD and all QA models trained on it are freely available and can be downloaded from: GitHub - BigData-IsfahanUni/PersianQuAD: PersianQuAD: The Native Question Answering Dataset for the Persian Language (Kazemi et al. IEEE ACCESS 2022)


|
|
|
| | |

|

|
|
| |
GitHub - BigData-IsfahanUni/PersianQuAD: PersianQuAD: The Native Questio...

PersianQuAD: The Native Question Answering Dataset for the Persian Language (Kazemi et al. IEEE ACCESS 2022) - ...

|

|

|

Please cite the following paper if you use the dataset in your research:  A. Kazemi, J. Mozafari and M. A. Nematbakhsh, "PersianQuAD: The Native Question Answering Dataset for the Persian Language," in IEEE Access, doi: 10.1109/ACCESS.2022.3157289. PersianQuAD: The Native Question Answering Dataset for the Persian Language


|
|
|
| | |

|

|
|
| |
PersianQuAD: The Native Question Answering Dataset for the Persian Language

Developing Question Answering systems (QA) is one of the main goals in Artificial Intelligence. With the advent ...

|

|

|

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 12594 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220310/943058d8/attachment.txt>



More information about the Corpora mailing list