I am working on a web crawling, scraping and corpus construction tool which can be run with Python, R or on the command-line.
It is currently used daily in production, notably to build monitor corpora for the ZDL/DWDS (where I work), the Internet Archive's sandcrawler project, or SciencesPo's médialab.
Documentation: https://trafilatura.readthedocs.io/ Software: https://github.com/adbar/trafilatura
The software combines crawling, download, extraction and format conversion functions. The latter two can be used in combination with the crawling, rendering and archiving tools mentioned in this thread, by using HTML files (with JavaScript rendered or not) as input in order to extract article/main text, comments and metadata. The resulting information can be exported as TXT, CSV, JSON or XML.
Concerning JavaScript and interaction with webpages, you could have a look at pupetteer or its Python port pypetteer: https://github.com/puppeteer/puppeteer/ https://github.com/pyppeteer/pyppeteer
Finally, here are complete examples of interaction with web archives: https://github.com/GLAM-Workbench/web-archives
I hope this helps! Best, Adrien