By now most plagiarism detection services are aware of concerns about copyright, etc. and it's very possible to use them without adding content where someone doesn't want it added. Of course this should be verified, but it's a mistake to assume that all materials submitted to these services are then copied and stored and made available to others.
Anyway, I think it's very reasonable to use these services for reviewing (and classroom use) and do so in a responsible way. They are in fact an example of NLP in action, which I think is nice to see.
Cordially, Ted
On Fri, Oct 7, 2011 at 4:42 AM, Vlado Keselj <vlado at cs.dal.ca> wrote:
>
>> > there is also the practice that some
>> > of us have of running papers we are going to review through
>> > commercial (or otherwise) plagiarism detection services.
>>
>> You may not realize it, but you do *not* have the right to do that.
>> These services retain anything you submit them, which is not something
>> you can authorize for a not-yet-published paper you don't have copyright
>> to.
>>
>> And it's extremely annoying for an author to be rejected because "that
>> has already been published", when in practice the previous version of
>> the paper has been rejected at another conference and you have enhanced
>> it since. Incompetent reviewers that says to something has already been
>> published without giving a citation are already annoying enough as it
>> is.
>
> I would like to agree with this comment. (Thanks Galibert for expressing
> it so clearly.) While checking for plagiarism in submitted papers is
> justified, it is alarming that a paper would be submitted to a commercial
> service, like the ones mentioned. I do not even use them with student
> papers, for justified objections by students.
>
> I guess, one can see a positive side to it: Authors can always be happy -
> even if their paper was rejected and they did not get to contribute to the
> science in an open way, they made an anonymous contribution to the wealth
> of a company. :-)
>
> On the research side, I think that it is an interesting research problem
> to describe a model where a paper can be checked for plagiarism with an
> option of not communicating the full paper but to use only a subset of
> n-grams, or substrings in general.
> (Another solution is that a company agrees to check paper for
> plagiarism, but not to keep it in their repository.)
>
>
> Regards,
> Vlado
>
-- Ted Pedersen http://www.d.umn.edu/~tpederse