[Corpora-List] Wikipedia data, JWPL - A Java-based Wikipedia API released

sai deepak tsaideepak at gmail.com
Mon Apr 27 08:40:02 CEST 2009


I T.Sai Deepak doing my M.Tech from IIT Roorkee. I am presently working on "Paraphrase Detection".

For my work I need to access wikipedia. I found your API as very much useful, but I am not able to download Wikipedia data since it is an FTP connection which requires authentication.

Is there any other possible way to download this data??

As mentioned in the Jwpl software document, I have downloaded the wikipedia data form http://download.wikimedia.org/backup-index.html

The three archives which i have downloaded are:

* [LANGCODE]wiki-[DATE]-pages-articles.xml.bz2

* [LANGCODE]wiki-[DATE]-pagelinks.sql.gz

* [LANGCODE]wiki-[DATE]-categorylinks.sql.gz

But for most of the pages I am getting an error that "Page not available" even though the page is available in Wikipedia. Can you please suggest me a solution for this problem.


Regards T. Sai Deepak M.Tech CSE IIT Roorkee. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1076 bytes Desc: not available Url : https://mailman.uib.no/public/corpora/attachments/20090427/8a92239f/attachment.txt

More information about the Corpora mailing list