[Corpora-List] Release of the FetchProt Corpus

Kristofer Franzén franzen at sics.se
Fri Sep 23 17:41:00 CEST 2005

Dear colleagues,

I am pleased to announce the first release of the FetchProt corpus.
It is based on 177 full text journal articles from the biological domain
analyzed for experiments on proteins to validate tyrosine kinase activity.
The 177 filled template files contain 591 experiments on wild types and
82 different mutants of 77 proteins.
Apart from the template files the corpus includes text versions of the
articles with the analyzed content tagged, as reference to where in the
article the information in the template is to be found.
The proteins and experiments are, among other things, linked to UniProt
identity codes, and Gene Ontology molecular function codes.

The corpus has been compiled within the FetchProt project, a
collaboration between Swedish Institute of Computer Science (SICS),
Center for Genomics and Bioinformatics at Karolinska Institutet (CGB/KI)
and Metamatrix AB, and has received partial funding from VINNOVA, the
Swedish Agency for Innovation Systems.
The aim of the project is to build a system that aids in populating the
EXProt database of proteins with experimentally verified functions, by
means of information extraction from full text scientific journal papers.

More information on the corpus and its analysis can be found in the
documentation at

The corpus is free to download from the project homepage at

Best regards,

Kristofer Franzén
Swedish Institute of Computer Science

More information about the Corpora-archive mailing list