[Corpora-List] Source code corpora

Alexandre Rafalovitch arafalov at gmail.com
Thu Nov 20 20:24:59 CET 2008

On Thu, Nov 20, 2008 at 2:21 PM, Klaus Guenther <klaus.guenther at split.uni-bamberg.de> wrote:
> So the main issue is finding code that can reliably be attributed to an
> author in an unmodified form and discovering details that are not
> attributable to the project's coding standard. I know of no such corpus.

This sounds like an interesting pre-condition research project then, as an inversion of 'keeping to the coding standards'.

Take a set of source code repositories and determine whether all contribution are bellow or above the threshold of similarity. Something with self-organisation, perhaps, and then comparing number of clusters with number of actual developers.

Personal blog: http://blog.outerthoughts.com/ Research group: http://www.clt.mq.edu.au/Research/ Hmm.



More information about the Corpora mailing list