I am not familiar with METER or PAN, so I'm not quite sure if the following will be helpful, but I have used a free program called Wcopyfind 2.6 which you can access at the URL below. I used it to find text reuse for a small corpus I created of newspaper articles on one particular subject. It worked very quickly, and there are various settings to choose from (e.g. have it bridge 3 or more words that are not 100% matches). There is also a very user-friendly guide that explains all the settings on the same web site. I had the program scan some 73 articles (from plain text fies) and it took perhaps 5 seconds for the results. The program tells you how many words and what percentage of reuse there is in the comparisons. You can also get a side by side screen of the results per pair. This was all very helpful because a few of the articles were around 95% text reuse, so if I had kept them in the corpus, it would have skewed the statistical
results I was working on. But you do need all the articles or essays or whatever that you want to scan (e.g. it does not check the Internet).
Kindest regards, Linda
________________________________ From: Muhammad Adeel <nawabadeel at gmail.com> To: corpora at uib.no Sent: Tue, March 2, 2010 4:46:43 PM Subject: [Corpora-List] Corpora for text reuse and plagiarism detection
Does anyone know corpora for text reuse and plagiarism detection apart from METER and PAN Ist Plagiarism detection competition corpora??
-- Regards Adeel -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2446 bytes Desc: not available URL: <http://www.uib.no/mailman/public/corpora/attachments/20100302/987e42c7/attachment.txt>