The Transactional Update Guide

Thorsten Kukuk

<kukuk@thkukuk.de>

Version 0.1, 15. December 2017

Abstract

This documentation describes how transactional update with btrfs works, what an
administrator needs to know about the system setup and what a packager needs to
know for his package.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. Introduction

    1.1. Description
    1.2. Definition
    1.3. Why transactional updates?

2. How it works

    2.1. Filesystem
    2.2. Update
    2.3. Commands used

3. Setup of system
4. Porting to other systems
5. Files
6. Author/acknowledgments
7. Copyright information for this document

Chapter 1. Introduction

1.1. Description

transactional-update is an application that allows to apply intrusive updates
to a running system in an atomic way without influencing the running system,
taking the system down for a longer period or blocks the boot process. It is
not a package manager, while implemented for zypper with RPMs, it can be
changed to use other package managers and package formats. The idea and reason
for this is, that you can continue to use your existing packages and tool chain
to deliver and apply updates.

To archive this, transactional-update creates for every update a new snapshot
with btrfs and updates this to the latest version of the product. Since
snapshots contain only the difference between two versions and thus are mostly
very small, this is very space efficient. Which also means you can have more
parallel installations than just two bootable root partitions.

1.2. Definition

A transactional update is a kind of update that:

  • is atomic

      □ the update does not influence your running system.

      □ you can at every time power off your machine. If you power it on again,
        either you have your unmodified old state, or the complete new one.

  • can be rolled back

      □ if the upgrade fails or if the newer software version is not compatible
        with your infrastructure, you can quickly restore the situation as it
        was before the upgrade.

1.3. Why transactional updates?

Linux distributions have working update mechanism since many, many years, why
do we need something new? There are different users, which have different
requirements. We have the Desktop user on a very stable distribution, for whom
the current update mechanism good enough. But we also have the bleeding edge
distribtuion with rolling updates and the enterprise customer with critical
applications, which have different requirements.

Distributions wit "rolling" updates face the problem: how should intrusive
updates be applied in a running system? Without breaking the update mechanism
itself? Like the migration from SysV init to systemd. Or the big version update
of the Desktop while the Desktop is running. Very likely will this update kill
the currently running Desktop, which would kill the update process, which
leaves the system in a broken, undefined state. Additional, if an update breaks
such a system, there needs to be a quick way to rollback the system to the last
working state.

On mission critical systems, the update is not allowed to interrupt the running
services. On such systems, interrupting running services is more expensive than
a scheduled reboot. And the system needs always to be in a defined state. Which
means, the updates are applied without error or no change is done. E.g. if a
post-install script of a RPM fails, the system is in an undefined state, which
should never happen.

Sometimes, new software versions of the kernel or software are incompatible
with your hardware or other software. In this case, there should be a quick and
easy way to rollback to the state before the update was applied.

There are other solutions available for the above problems, like downloading
all RPMs upfront and apply them during the boot phase. But this blocks the user
from using his PC if there is something urgently todo.

Chapter 2. How it works

2.1. Filesystem

For transactional updates the snapshot functionality of btrfs is used. Btrfs is
a general purpose Copy-on-Write (Cow) filesystem. The main feature of btrfs is,
that it provides subvolumes. This looks like a directory, but behave like a
mount point. They can be accessed from the parent subvolume like a directory,
or they can be mounted on other directories of the same filesytem. Snapshots
will be created from existing subvolumes, excluding other subvolumes inside of
it, and are by default read-only.

In theory this can be implemented with any CoW filesystem, as long as it
provides snapshot functionality.

2.2. Update

List of snapshots

At the beginning, there is a list of old snapshots, each one based on the other
one, and the newest one is the current root filesystem.

List of snapshots with new read-only Clone of current root filesystem

In the first step, a new read-only snapshot of the current root filesystem will
be created.

List of snapshots with a read-write Clone of current root filesystem

In the second step we switch the snapshot from read-only to read-write, so that
we can update it.

List of snapshots with a read-write Clone of current root filesystem, which
will be updated with zypper.

In the third step the snapshot will be updated. This can be zypper up or zypper
dup.

List of snapshots with the clone again read-only.

In the fourth step the snapshot will be changed back to read-only, so that the
data cannot be modified anymore.

List of snapshots with the read-only Clone the new default.

The last step is to mark the updated snapshot as new root filesystem. This is
now the atomic step: If the power would have been pulled before, the unchanged
old system would have been booted. Now the new, updated system will boot.

List of snapshots with the current root filesystem as newest at the end.

After reboot, the newly prepared snapshot is the new root filesystem. If
something bad happens, we can rollback to any of the older snapshots.

List of snapshots with a read-write Clone of current root filesystem, which
will be updated with zypper.

If we don't reboot and call transactional-update again, a new snapshot will be
created and updated. This new snapshot is based again on the current running
root filesystem. It is not based on newer snapshots. Newer snapshots cannot be
used as base for the next snapshot, since we don't know if they work or not. It
could be, that the admin found out that a newer snapshot did not boot and made
a rollback. If we always base our new snapshots on the latest one, it could
happen that the system ends in a non-working, non-fixable state.

2.3. Commands used

In the end, creating and updating snapshots are only a few commands:

  • SNAPSHOT_ID=`snapper create -p -d "Snapshot Update"`


  • btrfs property set ${SNAPSHOT_DIR} ro false


  • zypper -R ${SNAPSHOT_DIR} up|patch|dup


  • btrfs property set ${SNAPSHOT_DIR} ro true


  • btrfs subvol set-default ${SNAPSHOT_DIR}


    or with a read-write root filesystem:

    snapper rollback ${SNAPSHOT_ID}


  • systemctl reboot


Chapter 3.  Setup of system

Read-only root filesystem or Read-Write filesystem? Requirements for RPMs, what
is allowed and what not. Config files in /etc with overlayfs. Special handling
for passwd, shadow, group. Rollback. Strict seperation from data and
applications.

Chapter 4. Porting to other systems

You need a CoW filesystem (or anything else with snapshots and rollback), else
this should work with every package manager.

Chapter 5. Files

/usr/include/security/pam_appl.h

    Header file with interfaces for Linux-PAM applications.

/usr/include/security/pam_misc.h

    Header file for useful library functions for making applications easier to
    write.

Chapter 6. Author/acknowledgments

This document was written by Thorsten Kukuk <kukuk@suse.com> with many
contributions from ...

Chapter 7. Copyright information for this document

