The Btrfs filesystem used in combination with Snapper to allow system rollbacks without an external live system was pioneered by openSUSE almost a decade ago. This configuration creates Btrfs snapshots periodically and when package management transactions, such as updates, are performed. When a rollback is required, it can be effectuated very simply by booting into one of the read-only snapshots and issuing a Snapper command such that on next boot the system is in the same state as the chosen snapshot.
This post describes the rollback experience of an Arch system with the openSUSE Btrfs/Snapper configuration which had been installed using An Arch Linux Installation on a Btrfs Filesystem with Snapper for System Snapshots and Rollbacks. The issue that necessitated the rollback and as well as an update to the rollback process described in the article mentioned is also discussed.
A previous article on this site described a process to install Arch with the Btrfs filesystem and Snapper to manage Btrfs snapshots and system rollbacks. The layout of the Btrfs partition was modeled after that used by openSUSE by default. While the subvolume layout is nested to multiple levels and complex, it does allow for the system to be rolled back simply by booting into a read-only snapshot selected from the GRUB menu and issuing a single snapper command. Some Arch users, in YouTube videos, and on the Arch wiki itself, focus on the complexity of the subvolume layout and the setting of the default root subvolume in /etc/fstab and the GRUB configuration as reasons to not configure an Arch Btrfs system in the openSUSE manner, instead documenting a simple single level Btrfs subvolume layout which requires an external live ISO in order to execute a series of Btrfs commands to perform a rollback by replacing the current system subvolume by one of the Btrfs snapshots.
For me, the ease of the rollbacks outweigh the complexity of the subvolume layout which has no practical drawbacks as it is completely transparent in normal use. Also, the issue of setting the default subvolume is handled by simple modifications to /etc/fstab, and the GRUB configuration files /etc/grub.d/10_linux and /etc/grub.d/20_linux_xen during the installation process (addressed in An Arch Linux Installation on a Btrfs Filesystem with Snapper for System Snapshots and Rollbacks).
To aid the discussion of the rollback, the following image from the installation article is reproduced below. The most important characteristics of the configuration are:
After a year of using the Arch installation with the Btrfs/Snapper configuration described above and in the installation guide, I had to rollback the system. The immediately apparent issue that necessitated the rollback was that, after an update, Optimus Manager -- which I use on the Arch system to automatically set the graphics mode on the Lenovo Legion 5i Pro, configured to set discrete graphics if the external power supply is plugged in or to set hybrid graphics if running on battery -- would not load the NVIDIA driver kernel module. (See Nvidia Optimus on Linux for a description of NVIDIA Optimus, the NVIDIA driver kernel module, its options, and Optimus Manager) The issue is illustrated in the following image which shows two Konsole windows in which the problem is evident. The window on the bottom shows the output of
sudo optimus-manager --print-mode
which indicates that Optimus Manager was not able to set the graphics mode. In the same window, an attempt to run nvidia-smi results in an output that indicates that the NVIDIA driver is not running. The reason for this is apparent in Optimus Manager's current graphics switching attempt log, shown in the Konsole window on top. At two points in this log are clues to the problem; the first indicates that an Optimus Manager subprocess attempt to load the NVIDIA module with
modprobe nvidia NVreg_UsePageAttributesTable=1 NVreg_DynamicPowerManagement=0x02
fails with a SIGSEGV; at the second point, Optimus Manager reiterates that an attempt to load the NVIDIA module with
modprobe resulted in an error.
After some DuckDuckGo-ing, I discovered the root cause -- a new security feature for Intel processors implemented in the Linux kernel was incorporated into a new version of the Arch LTS kernel (I only use the LTS because it is the default set by GRUB and I never bothered to change it). The image below shows the Reddit post which mentions the issue and the solution.
The security feature in question is Indirect Branch Tracking, described in the Phoronix article pictured below, as
Indirect Branch Tracking is part of Intel Control-Flow Enforcement Technology (CET) with Tigerlake CPUs and newer. IBT provides indirect branch protection to defend against JOP/COP attacks by ensuring indirect calls land on an ENDBR instruction.
The solution provided in the Reddit post was to add the kernel parameter value
ibt=off in the kernel command line, a solution that I was able to verify worked on my Arch installation using the LTS kernel.
Two methods of rollback were described in the installation guide article: in the first, the rollback is initiated in the current read-write system, even if it has issues that are the motivation for the rollback, in the second -- the method documented by openSUSE, the rollback is initiated from a read-only snapshot of the system booted by selecting it from the GRUB menu. Both were demonstrated to work in the article, however, for some reason during an attempt to use the first method, there was an issue that prevented the rollback to complete successfully, and in the frenzy of wanting to rollback the system immediately, I decided to just use the second method -- from a read-only snapshot (the SUSE way) -- which worked as it should.
Performing the rollback allowed me to go back to a perfectly working system and determine the reason for the issue. I went through several cycles of rollback and update before I discovered the actual solution to the problem. After the first rollback I thought that -- although unlikely -- the problem with the NVIDIA module not loading may have been related to a recent intervention required on Arch due to the replacement of the base-devel package group by a meta-package. When the issue persisted after updating the reverted system, I performed a second rollback still working on the assumption that the problem was related to the base-devel, this time also reinstalling the base-devel meta-package. When the issue was still present I did more investigating and found the errors reported by Optimus Manager and nvidia-smi. The Optimus Manager error specifically led me to the issue as described in the Reddit post, pictured above, in which the poster stated that the a working solution would be to disable the new IBT feature with the kernel command line parameter
After adding the
ibt=off to the GRUB command line in /etc/default/grub and updating the GRUB configuration with grub-mkconfig, I performed a third rollback. The state of the system after the rollback is depicted in the following image in which three Konsole windows are shown.
snapper listin which all of the snapshots currently on the system are listed. This command was not executed immediately after the rollback but after subsequent package management transactions were performed following the rollback. But it does show the relevant information such as the two snapshots created for each rollback -- snapshots 814 and 815 for the first rollback (snapshot number is the leftmost column), snapshots 817 and 818 for the second rollback, and snapshots 832 and 833 for the third rollback. Note that the asterisk next to snapshot number 833 indicates that this snapshot is the currently mounted subvolume and that it is the one that will be mounted at the next boot. (See man snapper and An Arch Linux Installation on a Btrfs Filesystem with Snapper for System Snapshots and Rollbacks).
btrfs subvolume list /in which the all of the Btrfs subvolumes on the partition within the top level subvolume (Subvolume ID 5). The output lists each subvolume by subvolume ID and also indicates the parent subvolume by ID and the subvolume's path. Snapshot 833, the currently mounted subvolume at / is near the bottom of the output as
ID 1102 gen 38026 top level 257 path @/.snapshots/833/snapshot
/@/.snapshots/833/snapshot/boot/vmlinuz-linux-ltsand the mount option
rootflags=subvol=@/.snapshots/833/snapshotThe initial ram disk paths also reflect the current snapshot path. The partition in which the installation is contained is specified as with more conventional filesystems with the partition UUID (root=UUID= ...).
The actual process is summarized below.
sudo snapper -c root -v rollback -d "Third rollback due to instability after Arch packaging changes to 'base-devel' and more likely due to NVIDIA change <...>"In this command the global option -c specifies the name of the snapper configuration, in this case the default configuration named root is specified. The -v option requests verbose output. And the -d option specifies a comment or description of the rollback transaction, which is displayed in the output of snapper list (as shown in the image earlier in the article depicting the state of the system after the rollback).
One of the advanced capabilities of a Linux system with Btrfs is the filesystem's support of system snapshots due to its advanced Copy-on-Write. When this is paired with Snapper and SUSE/openSUSE's Btrfs subvolume layout, rollbacks of the system are easily performed with a single snapper rollback command.