08-28-2014, 10:52 AM
I've been trying to recover as much of WonderfulWaterloo.com as I can over the past few days, and I've set up a central repository on Github so we can organise all the scraped content and eventually import it to WRC.
Check it out, and let me know if you have any other ideas about how to recover that data. The more the merrier!
https://github.com/samnabi/wonderful-waterloo-recovery
Here's a section of the README:
Check it out, and let me know if you have any other ideas about how to recover that data. The more the merrier!
https://github.com/samnabi/wonderful-waterloo-recovery
Here's a section of the README:
Quote:How can I help?
1. Upload pages/images from your local cache. If you visited WonderfulWaterloo.com recently before it was taken offline, you may have local copies of those pages saved in your browser's cache. Use one of the methods below to save anything you can related to wonderfulwaterloo.com. Even if it seems irrelevant, don't delete it.
- Chrome - use this cache viewing tool: http://www.sensefulsolutions.com/2012/0 ... y-way.html
- Firefox - use this extension: https://addons.mozilla.org/en-US/firefo ... cheviewer/
- Internet Explorer - go to Tools > Internet Options > General tab > Temporary Internet Files > Settings button > View Files button
- Safari - use this app: http://echoone.com/filejuicer/
Once you've located your local cache of WonderfulWaterloo pages/assets, please submit them as a pull request so I can add them here. Or, if you're not familiar with git, get a hold of me at sam@samnabi.com.
2. Use Warrick to scrape various web archives. Warrick is a command-line Perl utility for recovering websites from your local cache, archive.org, Google cache, Bing cache, et al. I have already used this tool with some success, but multiple attempts by different machines may deliver more results.
Please submit a pull request for any data you recover with Warrick. There will be a lot of duplicates, but we can cross that bridge when we get there. No need to filter stuff out on your end.