Hey Guest,
Welcome, Join our awesome community where you can discuss on various topics
or Create an Account


Welcome Guest! In order to take advantage of all the great features that Waterloo Region Connected has to offer, including participating in the lively discussions below, you're going to have to register. The good news is that it'll take less than a minute and you can get started enjoying Waterloo Region's best online community right away. Click here to get started.

Dear WRConnected Users: WOW! Our fourth "birthday"! We've grown so much over the past four years, and much of that is because of you, the amazing WRConnected Users. But like any other website, there are costs associated with running it. As some of you may already know, we accept donations. Some of you have made donations (thank you!). This helps cover some of the background costs associated with running this site. If every user were to donate $1 we would more than cover our yearly expenses. If WRConnected is useful to you, take a minute and help keep it online for another year. Any donation is helpful. Thank you.

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Recovering old WW threads
#1
I've been trying to recover as much of WonderfulWaterloo.com as I can over the past few days, and I've set up a central repository on Github so we can organise all the scraped content and eventually import it to WRC.

Check it out, and let me know if you have any other ideas about how to recover that data. The more the merrier!

https://github.com/samnabi/wonderful-waterloo-recovery

Here's a section of the README:

Quote:How can I help?

1. Upload pages/images from your local cache. If you visited WonderfulWaterloo.com recently before it was taken offline, you may have local copies of those pages saved in your browser's cache. Use one of the methods below to save anything you can related to wonderfulwaterloo.com. Even if it seems irrelevant, don't delete it.

Once you've located your local cache of WonderfulWaterloo pages/assets, please submit them as a pull request so I can add them here. Or, if you're not familiar with git, get a hold of me at sam@samnabi.com.

2. Use Warrick to scrape various web archives. Warrick is a command-line Perl utility for recovering websites from your local cache, archive.org, Google cache, Bing cache, et al. I have already used this tool with some success, but multiple attempts by different machines may deliver more results.

Please submit a pull request for any data you recover with Warrick. There will be a lot of duplicates, but we can cross that bridge when we get there. No need to filter stuff out on your end.
Reply
#2
Has there been much success in trying to recovery any of the old data from WW?
Reply
#3
Not a whole lot. Still working behind the scenes to get more data, but I think I've got all the low-hanging fruit already. I'm hoping to get it archived properly with some help from UW.
Reply
#4
I've updated the GitHub page with more ways that people can help recover data from WW. Please do check  it out if you can lend a hand.

So far, 5955 individual posts across 126 threads have been recovered from the forums, along with 471 images.

The posts can be accessed as a big JSON file here: https://github.com/samnabi/wonderful-wat...23453.json

The images can be accessed here: https://github.com/samnabi/wonderful-wat...set_images

Still looking for support from the universities to archive these properly.
Reply
#5
Thanks so much for all of your hard work in recovering these. While we might not ever have them integrated on WRC, at least they're not totally lost.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)

About Waterloo Region Connected

Launched in August 2014, Waterloo Region Connected is an online community that brings together all the things that make Waterloo Region great. Waterloo Region Connected provides user-driven content fueled by a lively discussion forum covering topics like urban development, transportation projects, heritage issues, businesses and other issues of interest to those in Kitchener, Waterloo, Cambridge and the four Townships - North Dumfries, Wellesley, Wilmot, and Woolwich.

              User Links

              Advertise