[Corpora-List] NaCTeM Metabolite and Enzyme corpus

Paul Thompson Paul.Thompson at manchester.ac.uk
Fri Nov 16 15:21:04 CET 2012

Recently, the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus, it is important to recover them from the literature.

We are pleased to announce the availability of the NaCTeM Metabolite and Enzyme corpus: http://www.nactem.ac.uk/metabolite-corpus/

The corpus is intended to act as a means to train text mining systems to recognise metabolites and enzymes. It consists of 296 MEDLINE abstracts that have been manually annotated by domain experts.

The following paper provides more details about the corpus and a system trained to recognise metabolites automatically:

Nobata, C., Dobson, P., Iqbal, S. A., Mendes, P., Tsujii, J., Kell, D. B. and Ananiadou, S. (2011). Mining Metabolites: Extracting the Yeast Metabolome from the Literature. Metabolomics, 7(1), 94-101. (Available at: http://www.springerlink.com/content/e1727327007hx663/)


Paul Thompson Research Associate School of Computer Science National Centre for Text Mining Manchester Institute of Biotechnology University of Manchester 131 Princess Street Manchester M1 7DN UK Tel: 0161 306 3091 http://personalpages.manchester.ac.uk/staff/Paul.Thompson/

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5890 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121116/e3660247/attachment.txt>

More information about the Corpora mailing list