That Time My Encrypted RAID Failed

Display mode

Back to Quick Hacks

It was early December, 2012. The world was all a-flutter about the impending reset of the Mayan calendar, but I was unconcerned, streaming music from my 8TB RAID5. Then the music stopped, at 30 seconds or so. I tried a few more files: some worked but only for a short while, some didn't load at all. This was an md RAID5 set of four 2TB disks, with a LUKS encrypted volume inside. Somehow, the area of disk 1 holding the critical "key material" had been corrupted, so the encryption key was in memory but not on the disk when I rebooted. I made three mistakes around that time:

Any sane person would write the data off as lost, but I decided to hold out a sliver of hope that it was only one disk that had gone bad, and that I'd just failed to arrange things properly. I didn't want to futz with the disks any further, so any more work on them would have to wait until a disk large enough to hold all the images was available.

Fast forward to May 2015, and the release of that crazy SMR 8TB from Seagate. I ran out and pre-ordered one, seeing my chance, then set to dd'ing the RAID5 member partitions over to the 8TB disk. (Yes, I should've used ddrescue, but I got lucky, and the disks were physically fine.) Then I wrote a script and some permutations to run over the images of disks 1, 2 and 3 (in the order I'd left them in) trying four different RAID5 layouts, four different chunk sizes and 24 permutations. It should be noted that I deliberately assembled the RAID from 3 out of 4 disks, to prevent a rebuild overwriting anything.

That script ran on disk combinations 1/2/3, 1/3/4 and 2/3/4, and generated 384 LUKS header backups. Then I ran cryptsetup luksAddKey against each of those backups, using a file containing 24 possible variations on the passphrase I'd used to set up the encryption. So that's 9,216 attempts, most of which came back with "No key available with this passphrase". But one attempt out of all of those looked different:

luks.g413.ls.512: No key available with this passphrase.
luks.g413.ls.64: No key available with this passphrase.
luks.g431.la.128: No key available with this passphrase.
luks.g431.la.256: No key available with this passphrase.
luks.g431.la.512: No key available with this passphrase.
luks.g431.la.64: No key available with this passphrase.
luks.g431.ls.128: No key available with this passphrase.
luks.g431.ls.256: No key available with this passphrase.
luks.g431.ls.512: No key available with this passphrase.
luks.g431.ls.64:
Trying passphrase: In a hole in the ground, there lived a hobbit!
luks.1g23.ra.128: No key available with this passphrase.
luks.1g23.ra.256: No key available with this passphrase.
luks.1g23.ra.512: No key available with this passphrase.
luks.1g23.ra.64: No key available with this passphrase.

The degraded RAID set that didn't contain disk 2 was the right set. It turns out that the order I left the disks in all those years ago was 2/4/3/1, and the layout and chunk size were the defaults. Amazingly, the only thing that had been corrupted was the LUKS header on disk "2", and all the encrypted blocks were fine.

Earlier today, I bought a second 8TB disk to copy all the data off; the first 8TB will be repurposed as its mirror. Ten minutes ago, I finished the song that was so rudely aborted by the Mayan apocalypse.

Lesson learned: if you're going to encrypt your disks, keep a backup of the master key somewhere.