[Corpora-List] Corpora for text reuse and plagiarism detection

Linda Bawcom linda.bawcom at sbcglobal.net
Wed Mar 3 00:45:36 CET 2010


Dear Adeel,

I am not familiar with METER or PAN, so I'm not quite sure if the following will be helpful, but I  have used a free program called Wcopyfind 2.6 which you can access at the URL below.  I used it to find text reuse for a small corpus I created of newspaper articles on one particular subject. It worked very quickly, and there are various settings to choose from (e.g. have it bridge 3 or more words that are not  100% matches). There is also a very user-friendly guide that explains all the settings on the same web site. I had the program scan some 73 articles (from plain text fies) and it took perhaps 5 seconds for the results. The program tells you how many words and what percentage of reuse there is in the comparisons.  You  can also get a side by side screen of the results per pair. This was all very helpful because a few of the articles were around 95% text reuse, so if I had kept them in the corpus, it would have skewed the statistical

results I was working on. But you do need all the articles or essays or whatever that you want to scan (e.g. it does not check the Internet). 

http://plagiarism.phys.virginia.edu/Wsoftware.html

Kindest regards, Linda

________________________________ From: Muhammad Adeel <nawabadeel at gmail.com> To: corpora at uib.no Sent: Tue, March 2, 2010 4:46:43 PM Subject: [Corpora-List] Corpora for text reuse and plagiarism detection

Hi Everyone,

Does anyone know corpora for text reuse and plagiarism detection apart from METER and PAN Ist Plagiarism detection competition corpora??

-- Regards Adeel -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2446 bytes Desc: not available URL: <http://www.uib.no/mailman/public/corpora/attachments/20100302/987e42c7/attachment.txt>



More information about the Corpora mailing list