The
Hazel's Picture Gallery recovery story so far.
It occurs to me, especially after meeting Ashley, Chloe, Kate, Katherine, and Erin this afternoon, that not everyone who follows my LJ knows what this server recovery thing is all about. Allow me to explain. (The regulars, you know all this and can skip this post if you like.)
I bought my first digital camera several years ago, and started taking it with me to conventions. In fact I use my camera at most of the significant events in my life, especially activities that involve people or things I don't see every day. I upload my pictures to my website; I have a website for myself and
library_lynn at
www.boston-baden.com. (Bostonbaden.com, without the dash, is the domain I use when I give people my e-mail address. E-mail works either way, for my primary e-mail addresses -- hazel or chaz, and hazelchaz works too -- and you can get to the website with or without the dash too, it'll redirect you if it needs to.)
The pictures I upload are cut down into two sizes in addition to the high-resolution images; the high-res images aren't always put online, but a viewing-size (approx 640x480) and a thumbnail image (about 100 pixels high and wide) go with them. The web pages themselves are plain ordinary HTML pages; originally generated by a program and hand-edited, then later generated by a program using a crude DB_File database, and now generated by programs that use a MySQL database. (Dreamhost makes it easy to do MySQL databases. I love 'em.) I haven't finished rewriting the programs, though; so part of the unfolding story here has to do with my software work for the site.
Last summer, after several years of taking and uploading photos, I'd passed the 25,000 mark -- and then we had the great server crash on the machine in Minneapolis that had been hosting my domain. Yikes! He thought I had backups for everything; I had thought (for quite some time, actually) that a new mondo backup unit was just a short while away from coming online, and hadn't disk space to save my photos. (Ironically, it didn't occur to me to copy them onto a machine with cd-rom burning capability, even though I'd been burning cd-roms full of MP3s from all of my CDs for the home music project.)
So I signed up for a
paid hosting account on Dreamhost and started rebuilding from what backups were available. By winter, the dead easy work had already been done: the stuff that was easily rescued off of the old computer, the files that I had copies of in one form or another (mostly from 2000-2002, plus a scattering of recent photos that hadn't been deleted off of my primary machine), and so forth.
The disk drive from the old server has been subjected to a professional-grade recovery process, by somebody that works in the data-recovery field and knows his stuff. (Thanks,
dspisak!) Many people donated money against the possibility of our having to pay a thousand dollars to a data recovery company; the first one to explicitly say she'd put in $20 was Naomi Fisher, and when I put the tip jar on the website the money came rushing in. Some of that money went to the software Dan needed, and the rest is being applied against the hosting fees at Dreamhost. We have two years prepaid at this point, and I really appreciate the donations. (I need to start writing thank-you notes, in fact. But that's another story.)
The data recovered from the old server, however, is generally in the form on unidentified, unmarked, uncaptioned, unnamed and undated JPEG files. And a great deal of them seem to be corrupted to boot. At first I was worried that we were seeing corrupted data from the drive; now I believe that the recovery program may have not completely understood the filesystem on the server, and the data are all there, they just need some work in a hex editor.
A great deal of work has been done on that front, notably by
testerscot,
bovil,
johno, and lately
gvdub. (My apologies if I've left out any of the hex editor hacking crew, I'm typing this from memory.) So the result of that is hundreds and thousands of perfectly good, ready-to-reinstate photos, we just have to figure out where they go on the server.
Meanwhile. Several (at least half a dozen) people have brightly suggested that
The Wayback Machine Internet Archive might magically solve all my problems. Well, sort of, a little, not as much as you might expect. First of all, while the Wayback Machine has been saving copies of every web page it can lay its spider paws on, that doesn't always include all of the images on the pages. Or the images that are linked from the pages. And, they don't get every page, although they do index and save a lot.
I knew using the Wayback Machine would basically get a number of my thumbnail images, and a smaller number of the larger images. I'd held off investigating what I could get back, because I knew I would often just get a thumbnail and not the matching large image -- and if I just restored the thumbnail to a page, you wouldn't be able to tell that the full size picture was still missing. This lack, however, is something I can deal with on the new MySQL platform. And, even just a thumbnail helps, because the other part of the recovery project is yielding up perfectly good large images. So now the task of dovetailing the results from the two approaches begins.
I think that's the gist of it. I go to about 12 conventions a year, science fiction and animé now that I've started our little
animelosangeles convention; i take a lot of photos by sf fan standards, but I'm strictly small time compared to the people snapping photos at animé conventions. I am now about two or three conventions behind, plus a few parties and picnics, in my picture-uploading -- this is because I have other projects apart from the website software I mentioned, and the data recovery, there's a big convention (or a medium-sized one, if you come from an animé convention viewpoint) that I'm head webmaster for, and... well. You're up to date. Stay tuned here for further updates.
Any questions?