[Corpora-List] Getting articles from newspapers to compile a corpus

Sérgio Matos aleixomatos at ua.pt
Thu Nov 29 20:39:07 CET 2012


This may be helpful: http://sing.ei.uvigo.es/jarvest/index.html

-- Sérgio Matos IEETA Universidade de Aveiro

On Thursday, November 29, 2012 at 6:21 PM, Matías Guzmán wrote:


> Hi all,
>
> I was wondering if anyone knows how to get every possible article from online newspapers and magazines. I was thinking something like giving a program the URL of the newspaper (e.g. www.eltiempo.com (http://www.eltiempo.com)) and getting the text from all pages therein. Is that possible?
>
> Thanks a lot,
>
> Matías
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no (mailto:Corpora at uib.no)
> http://mailman.uib.no/listinfo/corpora
>
>

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1707 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121129/6e809c16/attachment.txt>



More information about the Corpora mailing list