podman - "Found incomplete layer" error

Scroll down to Solution if you found this page via Google and are in need for help.

In the process of migrating a couple of services to MicroOS I somehow managed it to completely jam my podman container engine:

# podman image ls
WARN[0000] Found incomplete layer "236fcd368394d7094f40012a131c301d615722e60b25cb459efa229a7242041b", deleting it 
Error: stat /var/lib/containers/storage/btrfs/subvolumes/236fcd368394d7094f40012a131c301d615722e60b25cb459efa229a7242041b: no such file or directory

Basically every podman command resulted in this error, even listing the images was not possible. sigh.
A quick Google search didn’t reveal anything helpful, so I did what every random dude on the internet does next: I asked Reddit for help.

But I was so fortunate that this seem to be one of those issues that are rare enough that nobody has encountered it before. another sigh

Ultimately I could solve it myself by just removing the containers directory (/var/lib/containers). Quick surgical operations, let’s just dump the broken stuff and let podman recreate it with new shiny configuration and cache files, fresh from the upstream repositories.

Solution

If you’re having this issue right now: remove the whole /var/lib/containers directory.
CAVE: This will remove all of your container volumes (unless stored elsewhere). Don’t do this if you have volume data, that you want to keep! You have been warned.

CREATE A BACKUP BEFORE PROCEEDING - All your container volumes are lost irrecoverably. Don’t trust a random dude from the internet with advice on your data. Have a backup.

WARNING: All your container volumes in /var/lib/containers will be destroyed!
If you don't have all volumes on custom locations, you might loose your data. Have a backup!

# umount /var/lib/containers/storage/btrfs
# rm -rf /var/lib/containers
# mkdir -p /var/lib/containers/storage/btrfs

Done. I restarted my podman systemd units, and they just pulled a new image, mounted the data volumes from /srv and I had everything back online in no time. Pew, dodged a bullet.

Use the overlay storage driver to avoid the issue from arising

To avoid this issue in the first place: Use the overlay storage driver instead of btfs in /etc/containers/storage.conf:

# Default Storage Driver
driver = "overlay"

At least until the issue is solved in podman itself.

How to get there

By grepping for the layer 236fcd368394d7094f40012a131c301d615722e60b25cb459efa229a7242041b I figured, that the layer and container information are stored in two json files:

/var/lib/containers/storage/btrfs-containers/containers.json
/var/lib/containers/storage/btrfs-layers/layers.json

Since all of my container data is stored in /srv, I could risk an attempt to just remove the directory and start with an empty podman cache/directory. So I created a VM snapshot and then went ahead to just kill the directory.

It worked nicely, except for the /var/lib/containers/storage/btrfs being busy, but a simple unmount and another rm -r later the directory was gone. I re-created the /var/lib/containers/storage/btrfs directory in case podman expects it to be there when creating subvolume mounts and then started the systemd services again.

What was the culprit?

I’m not sure, the issue happens while upgrading the Hypervisor that involved a couple of reboots. My working hypothesis is that podman got interrupted while doing a image pull in such a way, that the json file was out of sync with the btrfs filesystem, and this made podman stumble.

Update 2023-02-11

~~I’m not certain enough to file a bug for it but could resolve the issue, which for me at this point is more important, since I want to continue with the ongoing migration process~~.

I filed #16882 for this issue. I was suggested to use the overlay driver instead of btrfs and this resolved the problem for me. No issues in over two months now.