[Corpora-List] Getting articles from newspapers to compile a corpus

Alberto Simões albie at alfarrabio.di.uminho.pt
Thu Nov 29 19:34:12 CET 2012


Dear Matías,

I think you want a webcrawler, something like wget (for linux).

HTH

On 29/11/12 18:21, Matías Guzmán wrote:
> Hi all,
>
> I was wondering if anyone knows how to get every possible article from
> online newspapers and magazines. I was thinking something like giving a
> program the URL of the newspaper (e.g. www.eltiempo.com
> <http://www.eltiempo.com>) and getting the text from all pages therein.
> Is that possible?
>
> Thanks a lot,
>
> Matías
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



More information about the Corpora mailing list