Raspbian and btrfs

I've just got a brand new Raspberry Pi 4. For now I'm just playing around a bit with it. Until openSuSE Leap will be available, I'm using Raspbian Buster which comes by default with ext4. Since I want to have snapshots, the first thing I want to do is to convert the existing root partition into btrfs. So let's do this.

0. Get Raspbian

First, flash Raspbian to a SD card and boot it. I also recommend to run a system update after booting into Raspbian. There are plenty of tutorials on the internet, that are probably far better than what I can write.

1. Prepare initramfs

In Raspbian btrfs is included as module. In order to make the kernel mount a btrfs root filesystem, we need to build the corresponding initramfs. First install the necessary tools

Now we add the btrfs module to /etc/initramfs-tools/modules

Next is to build the initramfs

And tell the bootloader to load the initramfs, by editing /boot/config.txt

And then reboot the device, to check if everything is set up properly.
If the boot succeeds, shutdown the Raspberry and take the SD-Card to another computer. If you run at this stage into trouble, probably a filename is wrong and you should be still able to recover. Otherwise: Just start from scratch - at this point really nothing is lost.

2. Convert ext4 rootfs to btrfs

In my case I insert the SD card into my laptop. The SD card gets recognised as /dev/mmcblk0 and contains the following partitions:

To convert the filesystem to btrfs, we are now doing the following steps:

  1. Optional: Make sure, the rootfs is clean (run fstck)
  2. Convert ext4 to btrfs using btrfs-convert
  3. Mount new btrfs root
  4. Edit /etc/fstab
  5. Edit /boot/cmdline.txt

On my system, I have to do the following steps

Now we edit /etc/fstab and change ext4 to btrfs. We also need to disable the filesystem-check by setting the last two digits in the btrfs line to 0

IMPORTANT: Set the last two settings in /etc/fstab to 0 and 0. The last 0 is especially important for btrfs root, since fsck and btrfs do not go so well together.

Lastly we edit /boot/cmdline.txt. We neet to replace rootfstype=ext4 to rootfstype=btrfs and set fsck.repair=no

IMPORTANT: It is crucial to set fsck.repair=no. I was stuck at some weird "mounting failed: Invalid argument" errors, because the system wanted to perform a fsck and failed.

3. Now the fun starts

This is only the kickoff. Now the funny things, like subvolumes, snapshots ecc. start

Have a lot of fun! 🙂

Caveats

  • After a kernel update, you will need to run mkinitramfs again. Probably it's the best to only do manual kernel updates (even security updates) as otherwise your Raspi might not be able to boot again.

Additional notes

Check those notes, in case something went wrong. Those emphasis the steps I had to to to make this work

  • Fsck had cause me a lot of trouble. In case you run into mount invalid errors, check if you have disable fsck in /etc/fstab (the last zero) and in /boot/cmdline.txt
  • Apperently btrfs-convert doesn't change the UUID. If you find yourself with "device not found" or similar errors, this might has changed and you will need to change the UUIDs
  • After a Kernel update you will need to run mkinitramfs again. Keep that in mind (and maybe disable auto-updates)

Common pitfalls

Crappy image of the console output with the "mounting ... failed: invalid argument" error

I got this error message when I forgot to edit cmdline.txt. Make sure, you have configured /boot/cmdline.txt correctly (especially the rootfstype=btrfs and fsck.repair=no)

Backup KVM machines using btrfs snapshots

I just wrote a small Bash script for creating offline-backups of a bunch of virtual machines on a server using btrfs snapshots.

The script shutsdown all running KVM machines, waits until they are down, creates a (readonly) btrfs snapshot and spins the machines back up. All together takes less than a minute. After the process I have an image of all KVM machines in the state, when the machines are shut down. This is then suitable for storing the machine image files on a different machine to have a complete working state of the machines. This is part of my backup (more hardware failure) strategy for one of our general purpose servers at work.

The KVM instances need to be in a btrfs subvolume, otherwise it doesn't work

See the script as gist on GitHub. You will need to do some adjustments and probably test it a couple of times, until it will work nicely.

btrfs being notable slow in HDD RAID6

Disclaimer: I don't want to blame anyone. I love btrfs and would very much like to get it running in a good manner. This is a documentation about a use case, where I have some issues

I am setting up a new server for our Telescope systems. The server acts as supervisor for a handful of virtual machines, that control the telescope, instruments and provide some small services such as a file server. More or less just very basic services, nothing fancy, nothing exotic.

The server has in total 8 HDD, configured as RAID6 connected via a Symbios Logic MegaRAID SAS-3 and I would like to setup btrfs on the disks, which turns out to be horribly slow. The decision for btrfs came from resilience considerations, especially regarding the possibility of creating snapshots and conveniently transferring them to another storage system. I am using OpenSuSE 15.0 LEAP, because I wanted a not too old Kernel and because OpenSuSE and btrfs should work nicely together (A small sidekick in the direction of RedHat to rethink their strategy of abandoning btrfs in the first place)

The whole project runs now into problems, because in comparison to xfs or ext4, btrfs performs horribly in this environment. And with horribly I mean a factor of 10 and more!

The problem

The results as text are listed below. Scroll to the end of the section

I use the tool dbench to measure IO throughput on the RAID. When comparing btrfs to ext4 or xfs I notice, the overall throughput is about a factor of 10 (!!) lower, as can be seen in the following figure:

Throughput measurement using dbench - xfs's throughput is 11.60 times as high as btrfs !!!

sda1 and sda2 are obviosly the same disk and created with default parameters, as the following figure demonstrates

My first suspicion was, that maybe cow (copy-on-write) could be the reason for the performance issue. So I tried to disable cow by setting chattr +C and by remounting the filesystem with nodatacow. Both without any noticeable differences as shown in the next two figures

Same dbench run as before, with with chattr +C. No notice-worthy difference
Same dbench run as in the beginning, but with nodatacow mount option. Still, negligible difference

Hmmmm, looks like something's fishy with the filesystem. For completion I also wanted to swap /dev/sda1 and /dev/sda2 (in case something is wrong with the partition alignment) and for comparison also include ext4. So I reformatted /dev/sda1 with the btrfs system (was previously the xfs filesystem) and /dev/sda2 with ext4 (was previously the btrfs partition). The results stayed the same, although there are some minor differences between ext4 and xfs (not discussed here). The order of magnitude difference between btrfs and ext4/xfs remained

Swapping /dev/sda1 and /dev/sda2 and testing dbench on ext4 yields the same results: btrfs performs badly on this configuration

So, here are the results from the figures.

/dev/sda1 - xfsThroughput 7235.15 MB/sec 10 procs
/dev/sda2 - btrfsThroughput 623.654 MB/sec 10 procs
/dev/sda2 - btrfs (chattr +C)Throughput 609.842 MB/sec 10 procs
/dev/sda2 - btrfs (nodatacow)Throughput 606.169 MB/sec 10 procs
/dev/sda1 - btrfsThroughput 636.441 MB/sec 10 procs
/dev/sda2 - ext4Throughput 7130.93 MB/sec 10 procs

And here is the also the text file of the detailed initial test series. In the beginning I said something like a factor of 10 and more. This relates to the results in the text field. When setting dbench to use synchronous IO operations, btrfs gets even worse.

Other benchmarks

Phoronix did a comparison between btrfs, ext4, f2fs and xfs on Linux 4.12, Linux 4.13 and Linux 4.14 that also revealed differences between the filesystems of varying magnitude.

As far as I can interpret their results, they stay rather inconclusive. Sequential reads perform fastest on btrfs, while sequential writes apperently are horrible (Factor 4.3 difference).

Resizing a btrfs parition

This is a simple note to myself, in case I need to re-do this again. How to resize a btrfs partition to maximum size (or full capacity)

  1. Resize the partition using parted
  2. Resize the btrfs filesystem using

    Would it make sense to create an alias, so that also btrfs filesystem resize 100% or other percentages would work?

In a nutshell example

Detailed example

First resize the partition (I use parted for that purpose)

Done.