Jun. 21st, 2007

hazelchaz: (Default)

I thought I might explain something about how the Hazel's Picture Gallery recovery process is coming along. I just spent a couple hours working on the website, and our "Photos Lost" statistic still sits at 44% restored, 16,221 photos of 29,108 still lost. Hasn't gone up at all tonight. So what have I been doing?

Well, I've been sorting out JPEGs uploaded from the dead drive recovery discs into directories, They get uploaded into directories of 100 each. The first thing I do is skim through them to sort them out by event.

Here's two sets of directories... )

The "Initial Upload" shows what a collection of directories (in this case, from batch 38) looks like before I start any analysis. The first program I run on the server sorts out the corrupted files and renames all the directories.

You'll see that the "Before" list just has "jpg-19-" followed by a number, and then either nothing else, "bad", or all or part of a datestamp such as 2003, 2003-01, or 2003-01-11. The "After" list shows what it looks like after I've gone through and split up some of them. (These are actual snapshots of some of my current directories -- I've gone through the 17's but not the 19's, and thought you might interested in the gory details of the work I'm doing.) You can see that I'm not done with this phase for batch 17. For example directory jpg-17-10 doesn't have any kind of identification. So if there are original images with timestamps, they're not all from the same year. Or there aren't any original high-resolution images at all. I have to go and look at them to see what I can see.

(Why does one batch go up to 17, one to 45, and one to 31? Because each batch represents about 300-400 MB, and on the original data recovery DVDs they were directories full of thousands of files. I split them up into 100-file directories, and sorted them out by file extension -- JPG, GIF, etc. -- before they're uploaded to the server.)

If you want to see more about the process, I've made a copy of the jpg-17-23 directory with more annotations. Details enough to make your eyes glaze over... And there's a picture of [livejournal.com profile] neo_serenity in there, by the way. But that photo wasn't lost (or has already been restored). Part of the challenge here is identifying which photos from the old hard drive don't need to be restored because they're already in place -- so I potentially have to go through all 29 thousand, not just the 16 thousand that are missing!

Anyhow, what I was saying the other day was that if I can restore fifteen photos in fifteen minutes, then I might be able to get 60 photos restored in an hour. And if I work on it for an hour each night, five nights a week, then each week I can get that percentage up by 1% or more, and in 54 short weeks I'll be done. That is, if the base "if" assumption holds...

(Where does the "15 photos" come from? I happened to be working on some Baycon 2004 photos, from one subdirectory. By the time I got done, I had reinstated fourteen photos to part 23 and part 24. I was working on those in the first place because right now I'm trying to give priority to the 2004 photos; that year's photos were the hardest hit in the crash/recovery.)

Profile

hazelchaz: (Default)
Chaz Boston Baden

June 2019

S M T W T F S
      1
2345678
9101112131415
16171819202122
23 242526272829
30      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 23rd, 2026 02:42 am
Powered by Dreamwidth Studios