[Corpora-List] Summary of the answers re annotation of two documents in parallel

Irina Temnikova irina.temnikova at gmail.com
Tue May 16 20:15:08 CEST 2017

Dear all,

as promised & as asked by several people, here is the summary of my question regarding a tool for annotating two documents in parallel I posted on May 2, 2017. We are VERY thankful for ALL responses! I hope to not having forgot some suggestions.

We could not get EXACTLY what we wanted to achieve, but we are coming up with an acceptable solution.

What we wanted to achieve is to be able to open two annotation windows one next to the other (ideally containing parallel or comparable text), so our linguist annotators could edit both texts, compare them manually and mark issues (many different categories, which are created while annotating) in both of them. We wanted the output in XML and the interface to be VERY user-friendly, as our annotators have no experience at all in using annotation tools.

1) GATE, BRAT, etc., allow comparing two already created annotations, but not to annotate in the same time two documents. 2) We have been also suggested the solution to open two instances of the web-based annotation tools in the same time (like Brat and WebAnno). Both systems are very friendly to use, however, we would prefer the two annotations to be somehow related. 3) We have been suggested several tools for parallel visualization of two documents: http://diffuse.sourceforge.net/index.html, http://wanthalf.saga.cz/intertext. However, they do not offer easy annotation - e.g. annotators should manually add mark-up. Also, http://www.delightedbeauty.org/vvv allows a very nice automatic comparison of different translations of the same source text. 4) Apparently the CLaRK system (http://bultreebank.org/clark/index.html) allows displaying two or more synchronized (according to documents' structures or specified rules) documents in parallel & annotate them, as well as to have a XML output. However, we do not think that the system is friendly enough for our annotators ;) 5) We have been suggested a tool to display two documents and assign a limited number of relationships between sentences in the two documents. It was hard to make in a way for the annotators to add new categories while annotating. 6) MDSWriter (http://www.aclweb.org/anthology/P/P16/P16-4017.pdf) allows multiple complex tasks for constructing multi-document summaries (including editing the documents, annotating, etc.). However it was not doing exactly what we needed. 7) The Sanchay tool allows annotating two open documents, as well as manual alignment correction. We are still studying it. ( https://ufal.mff.cuni.cz/pbml/102/art-singh.pdf) 8) EHost (https://code.google.com/archive/p/ehost/wikis/wiki_Version.wiki) appears to be unmaintained.

We are still working on our solution, but we think to split the editing/alignment correction task from the annotation task, and to use e.g. GATE to annotate the parallel segments by listing both of them in the same window, one under the other.

Kind regards, and thank you again for all your inputs!


-- *Irina P. Temnikova, B.A., M.A., Ph.D.*

*Postdoctoral Researcher*

Arabic Language Technologies Research Group

Qatar Computing Research Institute

Hamad Bin Khalifa university (HBKU)

The Research and Development Complex (RDC)

P.O. Box 5825

Doha, Qatar

Mob: +974 33320188 <+974%203332%200188>

Tel: +974 ...

www.qcri.qa ------------------------------- -------------------------------- --------------------------------- *If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea. (Antoine de Saint-Exupery)* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6319 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170516/cf693523/attachment.txt>

More information about the Corpora mailing list