[Corpora-List] Comparing n-grams / authorship

Trevor Jenkins trevor.jenkins at suneidesis.com
Wed Apr 18 20:35:01 CEST 2012


You might also want to have your colleague look at the forensic linguistics work of people such as John Olsson, and Malcom Coulthard and Alison Johnson. Olsson has done much work on author attribution from very small samples. He has also undertaken larger scale studies in plagiarism.

Regards, Trevor.

<>< Re: deemed!

Sent from my iPad

On 17 Apr 2012, at 20:47, Mark Davies <Mark_Davies at byu.edu> wrote:


> I am sending the following question on behalf of a colleague at BYU. Thanks in advance for any suggestions you have; I'll forward them to the researcher who is working on this problem.
>
> Mark Davies, BYU
>
> -------------------------------------------
>
>
> I am working with a 250,000 word text. Within this text there are two chapters, A and B (1,200 and 2,400 words respectively). The authorship of these two chapters is unknown, but we have reason to believe to that the author(s) of A and B have a relationship that is different from the majority of the rest of the book. There are two 4-grams, three 6-grams, one 7-gram, one 8-gram, and one 9-gram shared in common in chapters A and B that appear nowhere else in the book. Intuitively it seems like there is a unique relationship between chapters A and B.
>
> The question is:
>
> Is there a statistical method of measuring whether the types of n-grams above establish a reasonable probability that the two texts are linked.
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list