[Corpora-List] Getting articles from newspapers to compile a corpus

Angus Grieve-Smith grvsmth at panix.com
Fri Nov 30 04:35:54 CET 2012

On 11/29/2012 4:28 PM, Linda Bawcom wrote:
> Because so many newspapers get their information from the same news
> services, I found a few articles that I had to disgard because of an
> over 80% similarity ratio and of course that skews statistics.

Good point! Some newspapers will abridge the wire stories more than others, so it might be useful to find a way to choose the longest version.


-Angus B. Grieve-Smith

grvsmth at panix.com

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1206 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121129/785f17c4/attachment.txt>

More information about the Corpora mailing list