[Corpora-List] Getting articles from newspapers to compile a corpus
Angus B. Grieve-Smith
grvsmth at panix.com
Sat Dec 1 20:17:03 CET 2012
On 11/29/2012 10:52 PM, True Friend wrote:
> I have a related question:News websites (these days) are using AJAX,
> this hides links while simultaneously generates them via javascript.
> See this page
> <http://www.nation.com.pk/pakistan-news-newspaper-daily-english-online/opinions/editorials>
> for example. Apparently this is the archive page for all Editorials on
> the newspaper website, but only a few are shown, and user has to click
> on "Show more news" under the given stories to get a few more previous
> editorials. Would an html crawler be able to bypass this and get all
> links hidden on this page?
>
It is possible. Certainly, anyone with enough programming skill
could write an HTML crawler that can give an AJAX website the
information it's looking for. In practice, it may be so obfuscated
that it's not worth the time and effort.
--
Angus B. Grieve-Smith
grvsmth at panix.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 3280 bytes
Desc: not available
URL: <https://mailman.uib.no/public/corpora/attachments/20121201/aa7f7d5a/attachment.txt>
More information about the Corpora
mailing list