Post Mortem on openSUSE Leap 15.0 update problems

Post Mortem on openSUSE Leap 15.0 update problems

Marcus Meissner
Hi folks,

This is a post mortem on the 15.0 Update problems from last Friday / weekend.

On Friday morning german time, a otherwise inconspicious but incorrect
configuration change (to enable gcc8 to build for i586) caused a full wipe
of the 15.0 update project and projects linking to it.

The 15.0 update project on our download servers got emptied due to this.

This was spotted an hour later, and we went into recovery mode.

First we restored a previous static state that was made available in
the Friday afternoon, 2 hours after we noticed the problem.

As there is no backup or snapshotting of binaries we had to reestablish
the binary state in a more manual way.

We used the existing 15.0 update binaries left over by the wipe to
reestablish all existing updates.

This reestablishing however caused resigning of the packages. The
content is still binary identical, but the cryptographic signature is
different. It is signed by the same key, but a different random nonce
caused a different Hmac.

As the filename was still the same, not all mirrors synced the newly
released packages, which caused "bad digest" messages on repositories
until Monday morning.

Then we changed the update project layout to include a "rpms/" directory,
so every mirror refetches the RPMs and will only offer the correct RPMs.

With Monday 10:00 am everything should be working as expected again.

Note that every RPM released has the binary identical content as the
ones before, just the checksum is different due to resigning.

We have some learning experiences from that:

- more careful use of dangerous config options of course

- establish binary backup and snapshot strategy for the OBS, which was
  currently not there (due to lack of budget).

- we will fix several bugs that made this problem worse then necessary

We are sorry for the inconviences caused by this problem.

Thanks especially to Adrian who did most of the fixing.

Ciao, Marcus (for openSUSE Maintenance)
