[Corpora-List] Update to the PubMed Central Article Datasets

Demner Fushman, Dina (NIH/NLM/LHC) [E] ddemner at mail.nih.gov
Thu Sep 23 23:18:34 CEST 2021

Dear all, To enhance machine access to biomedical literature and drive impactful analyses and reuse, the National Library of Medicine (NLM) has made the following updates to distribution of the two largest PMC Article Datasets<https://www.ncbi.nlm.nih.gov/pmc/tools/textmining/> - The PMC Open Access Subset and Author Manuscript Dataset:

* To support retrieval of individual uncompressed article XML and plain text files for efficient cloud computing, these datasets are now available through the Amazon Web Services (AWS) Registry of Open Data as part of AWS’s Open Data Sponsorship Program (ODP). Learn more<https://ncbiinsights.ncbi.nlm.nih.gov/2021/09/01/pubmed-central-article-datasets-cloud/>

* The PMC FTP service has been restructured to support bulk retrieval packages of the XML and plain text of the articles in these datasets into baseline and daily incremental files. These new packages are available now, alongside previous bulk packages; the previous bulk packages will be moved in November 2021 and deleted in March 2022. Learn more<https://www.ncbi.nlm.nih.gov/pmc/about/new-in-pmc/#2021-09-21> If you have questions or feedback, please reach out to NLM directly at pubmedcentral at ncbi.nlm.nih.gov<mailto:pubmedcentral at ncbi.nlm.nih.gov>

Posted on behalf of: Rebecca Orris, PhD Literature Program Special Projects NCBI/National Library of Medicine National Institutes of Health

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8246 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210923/040a3e65/attachment.txt>

More information about the Corpora mailing list