[Corpora-List] Using version control software in corpus construction

David Graff graff at ldc.upenn.edu
Mon Mar 29 17:34:10 CEST 2010


Just a couple comments to supplement the excellent discussion so far... I think the issues at hand might fall under two distinct functions:

- keeping a release history for the corpus as a whole

- keeping an audit trail of changes to specific elements in the corpus

A version control system is an obvious solution for the first, while a relational database can be a much easier solution for the second (assuming the necessary infrastructure is in place for maintaining a DB-mediated corpus).

A thorough and meticulous corpus manager (with adequate schedule and budget) would want both. Others would choose one or the other based on what matters most in the given situation.

Of course, there's a third type of issue:

- keeping track of changes in the table/XML structures that organize

the corpus

but this is just a matter of maintaining the release history and audit trail of the database schema and/or DTD for the corpus.

Best regards,

----------- David Graff graff at ldc.upenn.edu Linguistic Data Consortium 3600 Market St., Suite 810 Philadelphia, PA 19104



More information about the Corpora mailing list