Just a couple comments to supplement the excellent discussion so far... I think the issues at hand might fall under two distinct functions:
- keeping a release history for the corpus as a whole
- keeping an audit trail of changes to specific elements in the corpus
A version control system is an obvious solution for the first, while a relational database can be a much easier solution for the second (assuming the necessary infrastructure is in place for maintaining a DB-mediated corpus).
A thorough and meticulous corpus manager (with adequate schedule and budget) would want both. Others would choose one or the other based on what matters most in the given situation.
Of course, there's a third type of issue:
- keeping track of changes in the table/XML structures that organize
the corpus
but this is just a matter of maintaining the release history and audit trail of the database schema and/or DTD for the corpus.
Best regards,
----------- David Graff graff at ldc.upenn.edu Linguistic Data Consortium 3600 Market St., Suite 810 Philadelphia, PA 19104