[Corpora-List] Source code corpora

Alexandre Rafalovitch arafalov at gmail.com
Thu Nov 20 18:09:23 CET 2008

Wouldn't any source code repository with version control system give you that automatically? They all tell you exactly which code was contributed and by whom.

E.g. SourceForge, Apache or Linux Kernel collections.

http://www.koders.com/ might be a good way to search, if you are trying to narrow down to a particular area.


Alex. Personal blog: http://blog.outerthoughts.com/ Research group: http://www.clt.mq.edu.au/Research/

On Thu, Nov 20, 2008 at 1:28 AM, <sdb at cs.rmit.edu.au> wrote:
> Dear colleages,
> My research relates to authorship attribution of source code (that is,
> determining the owner of anonymous work samples based upon other work
> samples where authors are known).
> I'm looking for recommendations for source code corpora for this task
> for any programming language. For the corpora to be useful, authorship
> has to be identified.

More information about the Corpora mailing list