I fixed some of the character encoding bugs people were seeing. That should take care of the "â€" bugs. I also fixed the inline image width with a css tweak. @dmcoco84 thanks for the heads up! Also thanks to those who have donated.
For those curious, it was not as simple as using a web scraper. Because of the way that basejumper.com was structured, there were "duplicate" pages based on sorting, pagination, etc. Also I wanted to make sure I got the incidents section which required being logged in.
So I wrote a structured scraper that parsed each page of the forums, threads, and individual posts. Raw data was written to json files. Attachments were also downloaded. The classic bj emojis obviously needed to be preserved too
Then I made a quick webapp to render the json data and as html. I exported those rendered html files to amazon S3, fronted by cloudfront. There was some hacking to get the content-type to be correct on S3.
Here are downloadable archives of the rendered html, and the structured json data. @Colm @sfzombie13
https://basejumper.net/bj-html-v1.zip (820mb) https://basejumper.net/bj-json-v1.zip (50mb)