[Corpora-List] Law and judgement cases corpus wanted

Craig Pfeifer craig.pfeifer at gmail.com
Thu Mar 3 21:49:53 CET 2016


The US Supreme Court opinions are available online: http://www.supremecourt.gov/opinions/opinions.aspx

You can download the opinions in batches via wget: *wget -r --accept=pdf http://www.supremecourt.gov/opinions/slipopinion/<two <http://www.supremecourt.gov/opinions/slipopinion/<two> digit year>*

*This will get other documents too, but the opinions will be in there.*

The opinions are published as PDFs, and the text can be extracted via Apache Tika (https://tika.apache.org/)

There's a bit of cleanup of the text after that, but nothing a little regex cannot handle.

Craig

On Thu, Mar 3, 2016 at 2:03 PM Kiril Simov <kivs at bultreebank.org> wrote:


> Dear Nathan,
>
> You could try the following document collection from EUCases European
> Project:
>
> http://download.webclark.org/EUCasesLOD/
>
> There are XML document with the corresponding texts.
>
> You could find documentation at http://www.eucases.eu/
>
> With best regards,
>
> Kiril
>
> *From:* Nathan Hu <nathan3dvrlab at gmail.com>
> *Sent:* Thursday, March 03, 2016 7:50 PM
> *To:* corpora at uib.no
> *Subject:* [Corpora-List] Law and judgement cases corpus wanted
>
> Hi,
>
> Does anyone know where can I find law or judgement cases corpus?
>
> Is there any open resources?
>
> Best,
> Nathan
>
> ------------------------------
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4444 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160303/1cea48d3/attachment.txt>



More information about the Corpora mailing list