Today it seems, the IT gremlins have been out to get me. At my work I have a desktop computer (personal hardware) consisting of a Rysen 7 1700, 16GB RAM, a 240GB Intel M.2 SATA SSD (540 series) and a 4TB Western Digital HDD.
The machine has been, pretty reliable, not rock-solid, in particular, compiling
gcc sometimes segfaulted for reasons unknown (the RAM checks out okay according to memtest86), but for what I was doing, it mostly ran fine. I put up with the minor niggles with the view of solving those another day. Today though, I come in and find X has crashed.
Okay, no big deal, re-start the display manager, except that crashed too.
Hmm, okay, log in under my regular user account and try
startx: No dice, there’s no room on
Ahh, that might explain a few things, we clean up some log files, truncate a 500MB file, manage to free up 50GB (!).
The machine dual-boots two OSes: Debian 9 and Gentoo. It’s been running the latter for about 12 months now, I used Debian 9 to get things rolling so I could use the machine at work (did try Ubuntu 16.04, but it didn’t like my machine), and later, used that to get Gentoo running before switching over. So there was a 40GB partition on the SSD that had a year-old install of Debian that wasn’t being used. I figured I’d ditch it, and re-locate my Gentoo partition to occupy that space.
So I pull out an Ubuntu 18.04 disc, boot that up, and get
gparted going. It’s happily copying, until WHAM, I was hit with an I/O error:
Clicking any of the three buttons resulted in the same message. Brilliant. I had just copied over the first 15GB of the partition, so the Debian install would be hosed (I was deleting it anyway), but my Gentoo root partition should still be there intact at its old location. Of course the partition table was updated, so no rolling back there. At this point, I couldn’t do anything with the SSD, it had completely stalled, and I just had to cut my losses and kill
I managed to make some room on the 4TB drive shuffling some partitions around so I could install Ubuntu 18.04 there. My
/home partition was
btrfs on the 4TB drive (first partition), the rest of that drive was LVM. I just shrank my
/home down by 40GB and slipped it in there. The boot-loader didn’t install (no EFI partition), but who cares, I know enough of
grub to boot from the DVD and bootstrap the machine that way. At first it wouldn’t boot because in their wisdom, they created the root partition with a
@ subvolume. I worked around that by making the
@ subvolume the default.
Then there was momentary panic when the
/home partition I had specified lacked my usual files. Turned out, they had created a
@home subvolume on my existing
/home partition. Why? Who knows? Debian/Ubuntu seem to do strange things with
btrfs which do nothing but complicate matters and I do not understand the reasoning. Editing
/etc/fstab to remove the subvolume argument for
/home and re-booting fixed that.
I set up a LVM volume that would receive a DD dump of the mangled partition to see what could be saved. GNU’s
ddrescue managed to recover most of the raw partition, and so now I just had to find where the start was. If I had the output of
fdisk -l before I started, I’d be right, but I didn’t have that foresight. (Maybe if I had just formatted a LVM volume and DD’d the root fs before involving
gparted? Never mind!)
I figured there’d be some kind of magic bytes I could “grep” for. Something that would tell me “BTRFS was here”. Sure enough, the information is stashed in the superblock. At
0x00010040 from the start of the partition, I should see the magic bytes
5f 42 47 52 66 53 5f 4d. I just needed to
grep for these. To speed things up I made an educated guess on the start-location. The screenshot says the old partition was about 37.25GB in size, so that was a hint to maybe try skipping that bit and see what could be seen.
Sure enough, I found what looked to be the superblock:
root@vk4msl-ws:~# dd if=/dev/mapper/scratch-rootbackup skip=38100 count=200 bs=1M | hexdump -C | grep '5f 42 48 52 66 53 5f 4d' 02e10040 5f 42 48 52 66 53 5f 4d 9d 30 0d 02 00 00 00 00 |_BHRfS_M.0......| 06e00040 5f 42 48 52 66 53 5f 4d 9d 30 0d 02 00 00 00 00 |_BHRfS_M.0......| 200+0 records in 200+0 records out
Some other probes seem to confirm this, my quarry seemed to start 38146MB into the now-merged partition. I start copying that to a new LVM volume with the hope of being able to mount it:
root@vk4msl-ws:~# dd if=/dev/mapper/scratch-rootbackup of=/dev/mapper/scratch-gentoo--root bs=1M skip=38146
Whilst waiting for this to complete, I double-checked my findings, by inspecting the other fields. From the screenshot, I know my filesystem UUID was
6513-682e-7182-4474-89e6-c0d1c71866ad. Looking at the superblock, sure enough I see that listed:
root@vk4msl-ws:~# dd if=/dev/scratch/gentoo-root bs=$(( 0x10000 )) skip=1 count=1 | hexdump -C 1+0 records in 1+0 records out 00000000 5f f9 98 90 00 00 00 00 00 00 00 00 00 00 00 00 |_...............| 65536 bytes (66 kB, 64 KiB) copied, 0.000116268 s, 564 MB/s 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000020 65 13 68 2e 71 82 44 74 89 e6 c0 d1 c7 19 66 ad |e.h.q.Dt......f.| 00000030 00 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 |................| 00000040 5f 42 48 52 66 53 5f 4d 9d 30 0d 02 00 00 00 00 |_BHRfS_M.0......| 00000050 00 00 32 da 32 00 00 00 00 00 02 00 00 00 00 00 |..2.2...........| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Looks promising! After an agonising wait, the
dd finishes. I can check the filesystem:
root@vk4msl-ws:~# btrfsck /dev/scratch/gentoo-root Checking filesystem on /dev/scratch/gentoo-root UUID: 6513682e-7182-4474-89e6-c0d1c71966ad checking extents checking free space cache block group 111690121216 has wrong amount of free space failed to load free space cache for block group 111690121216 block group 161082245120 has wrong amount of free space failed to load free space cache for block group 161082245120 checking fs roots checking csums checking root refs found 107544387643 bytes used, no error found total csum bytes: 99132872 total tree bytes: 6008504320 total fs tree bytes: 5592694784 total extent tree bytes: 271663104 btree space waste bytes: 1142962475 file data blocks allocated: 195274670080 referenced 162067775488
Okay, it complained that the free space was wrong (which I’ll blame on
gparted prematurely growing the partition), but the data is there! This is confirmed by mounting the volume and doing a
root@vk4msl-ws:~# mount /dev/scratch/gentoo-root /mnt/ root@vk4msl-ws:~# ls /mnt/ -l total 4 drwxr-xr-x 1 root root 1020 Oct 7 14:13 bin drwxr-xr-x 1 root root 18 Jul 21 2017 boot drwxr-xr-x 1 root root 16 May 28 10:29 dbus-1 drwxr-xr-x 1 root root 1686 May 31 2017 dev drwxr-xr-x 1 root root 3620 Oct 19 18:53 etc drwxr-xr-x 1 root root 0 Jul 14 2017 home lrwxrwxrwx 1 root root 5 Sep 17 09:20 lib -> lib64 drwxr-xr-x 1 root root 1156 Oct 7 13:59 lib32 drwxr-xr-x 1 root root 4926 Oct 13 05:13 lib64 drwxr-xr-x 1 root root 70 Oct 19 11:52 media drwxr-xr-x 1 root root 28 Apr 23 13:18 mnt drwxr-xr-x 1 root root 336 Oct 9 07:27 opt drwxr-xr-x 1 root root 0 May 31 2017 proc drwx------ 1 root root 390 Oct 22 06:07 root drwxr-xr-x 1 root root 10 Jul 6 2017 run drwxr-xr-x 1 root root 4170 Oct 9 07:57 sbin drwxr-xr-x 1 root root 10 May 31 2017 sys drwxrwxrwt 1 root root 6140 Oct 22 06:07 tmp drwxr-xr-x 1 root root 304 Oct 19 18:20 usr drwxr-xr-x 1 root root 142 May 17 12:36 var root@vk4msl-ws:~# cat /mnt/etc/gentoo-release Gentoo Base System release 2.4.1
Yes, I’ll be backing this up properly RIGHT NOW. But, my data is back, and I’ll be storing this little data recovery technique for next time.
The real lesson here is:
- KEEP RELIABLE BACKUPS! You never know when something will fail.
- Catch the copy process before it starts overwriting your source data! If there’s no overlap between the old and new locations, you’re fine, but if there is and it starts overwriting the start of your original volume, it’s all over red rover! You might be lucky with a superblock back-up, but don’t bet on it!
- Make note of the filesystem type and its approximate location. The fact that I knew roughly where to look, and what sort of filesystem I was looking for meant I could look for magic bytes that say “I’m a BTRFS filesystem”. The magic bytes for EXT4, XFS, etc will differ, but the same concepts are there, you just have to look up the documentation on how your particular filesystem structures its data.