In Corpora List, there is another post with the similar topic. You can find it here http://mailman.uib.no/public/corpora/2010-September/011285.html
I am working on Wikipedia dump and found out following tool is also suitable code.google.com/p/wikixmlj/
Regards,
Nasrin Baratalipour, Natural Language and text Processing Laboratory(http://ece.ut.ac.ir/NLP), School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
On Wed, Jun 20, 2012 at 10:16 PM, Rahma Sellami <rahma.sellami at gmail.com>wrote:
> Hello,
>
> I downloaded WIkipedia dump XML format, I want to eliminate the wikipedia
> tags to extract the plain text.
> I found the tool wikiprep and I installed it but I do not know what
> script that eliminates the markup wikipedia.
>
> Thanks
> --
>
> RAHMA Sellami
> PhD Computer Science Student
> http://sites.google.com/site/rahmasellami/
> <http://sites.google.com/site/rahmasellami/>
> Faculty of Economic Sciences and management of Sfax
> ANLP Research Group
> http://sites.google.com/site/anlprg
>
> MIRACL Laboratory
> www.miracl.rnu.tn
>
> Email: rahma.sellami at gmail.com
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 3413 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20120622/f16254e3/attachment.txt>