Apt Purge os-prober!

If you are getting kernel errors like

EXT4-fs (sda2): unable to read superblock
EXT4-fs (sda2): unable to read superblock
EXT4-fs (sda2): unable to read superblock
FAT-fs (sda2): bogus number of reserved sectors
FAT-fs (sda2): bogus number of reserved sectors
qnx4: no qnx4 filesystem (no root dir).

for almost every storage like device in /dev/, you know something has to be wrong! However, we ignored those messages for months since everything seemed to be fined. Still, I was concerned.

Yesterday I decided to look into this again. Between the errors quoted above, there was a kernel warning saying

>>>WARNING<<< Wrong ufstype may corrupt your filesystem,

which didn’t sounded any better. Finally I found Debian Bug #788062 “os-prober corrupts LVs/partitions while being mounted inside a VM”. And indeed, our suspicious log entries start with

"debug: running /usr/lib/os-probes/50mounted-tests on /dev/sda2"

On kernel updates or manual grup-update usage this bug might corrupt your storage or put your file systems into read only without having any idea where to look for the problem.

And the moral of this story: apt purge os-prober on your servers and don’t expect debian-boot to react to such a bug within 6 months.

Update: I have noticed that the general behavior has been reported to Debian BTS back in Dec 2014. But it has been considered “entirely cosmetic”. It is also on Launchpad (Ub***u) since 2014. However, back then no severe impact has been reported.

GitLab

We are running GitLab CE via the “omnibus” (what ever this is) Debian package since it’s availability in May 2015. Due to GitLab’s version policy we are constantly upgrading our installation. However, we only ran into minor problems with this approach. Recent examples are:

  • Backup broke (workaround available, fixed after two days)
  • Admin page broke (workaround available)

What gives me confidence in our setup are the very short reaction times on the GitLab bug tracker. This includes fast fixes via new versions and the availability of workarounds. However, for more critical infrastructure it would be wise to delay non-critical updates for some weeks.

A tale of bytes and strings in python3’s smtplib

The one feature why I changed almost all my projects from python2 to python3 is the vastly improved handling of encoding stuff. In python2, I was never sure if I needed to throw in a .decode or a .encode and with which arguments to make things work. All my üs and äs would end up as weird characters, so I would try an .encode, which sometimes solved it and sometimes made it weirder yet. So I would try .decode instead, which then sometimes solved it and sometimes didn’t. It was not fun.

Now, for python3, the story is much better and cleaner, since I either have utf-8 strings, which I can print and everything or I have bytes, which are just bytes and need to be decoded before they can be treated as strings. Standard library functions in python3 take and return either strings or bytes. Take for example the open() call: depending on the mode, it returns bytes or strings. If I try to write bytes to a file opened in string mode, I get a TypeError.  So everything is warm and nice and I get type errors if I do stupid things, and then I immediately know if I have to decode or encode.

So I wrote a small program which takes a mail on stdin and passes it via LMTP to dovecot, using python3’s smtplib. Everything worked, no type errors anywhere, I even tested it by sending some weird characters in an email. It worked. I deployed to the hemio mail server. A few days later, I get an SMS in the morning: We are loosing mails! Just silently dropping them. WHAT? That’s of course the worst possible thing you can do as a mailserver. After shutting down the mail server to prevent further breakage, I check the log what was happening. The Traceback I see gives me flashbacks to python2:

Traceback (most recent call last):
File “/usr/local/lib/lda-lmtp.py”, line 163, in
exitcode = main(args)
File “/usr/local/lib/lda-lmtp.py”, line 57, in main
msg = sys.stdin.read()
File “/usr/lib/python3.4/encodings/ascii.py”, line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 13114: ordinal not in range(128)

What? How do I get a UnicodeDecodeError? I thought I was passing unicode strings around all the time, why decode? Checking the documentation of smtplib.SMTP.sendmail, it says:

…msg may be a string containing characters in the ASCII range, or a byte string. A string is encoded to bytes using the ascii codec…

So: smtplib.SMTP.sendmail wants bytes. However, if you pass a string instead, it will silently .encode it using the ‘ascii’ codec. WHY? One of the features of python3 is that you have to consciously decide if you want to encode or decode, instead of the willy-nilly casting/one-type-fits-all of python2. But, helpful as ever, smtplib just ascii-encodes your msg for you. Which will barf on interesting characters. Which ended up just dropping mails. Not nice.

The fix was easy: I re-opened stdin in binary mode and just read in the mails in binary directly, such that my program never has to think about encodings and strings. But I am very confused why smtplib is going out of its way to confuse python3 developers. If you can only deal with bytes, just accept bytes. Throw a traceback if you are given strings. Don’t silently try ascii-decoding a given string. It hurts, it loses mails and virtual kittens die!

And why didn’t my testing catch that earlier? Well, my mail program, claws-mail, encodes all outgoing mails in 7bit-printable encoding automatically, so I never actually tested 8bitmime. -_-

PostgreSQL Arrays

Before I start the week, let’s wrap up the weekend. I was hacking on HamSql and got in trouble with PostgreSQL arrays again. Recently, I stumbled over misleading documentation for array operators. I wanted to report this issue and remembered that the PostgreSQL community is working without bug trackers. But don’t be scared, I got a really fast and kind reaction on the mailing list and in this way the 9.5 documentation covers those pitfalls explicitly. This weekend I hit the PostgreSQL “array lower bound feature”. The index of PostgreSQL arrays starts at 1, but that’s only a default. You can set the default to any number you like. I did forget this feature immediately after reading the docs and would just have ignored this feature for ever if not PostgreSQL internals would use it some times. It comes as no surprise that many client libraries don’t know how to handle arbitrary lower bounds for arrays. Unfortunately, the postgresql-simple Haskell library is no exception. It took me some time to realize the problem, as the issued error message was not that helpful.

While trying to work around this bug I hit another array function corner case. The documentation states for array concatenation “the result retains the lower bound subscript of the left-hand operand’s outer dimension”. Hence, I should be able to fix the problem using something like '{0}'::int[] || '[0:1]={1,2}'. And this works just fine. So let’s just take an empty array on the left site: '{}'::int[] || '[0:1]={1,2}'. Booom! This acts as identity and leaves the bounds untouched. I am using ARRAY(SELECT UNNEST(...)) to reset the lower bound now. Not sure if I should report this || operator issue too.

Preseeding with Consequences

For VM installs, we use a preseed which sets the debconf priority to  critical. This disables questions from the installer, where we agree to the default anyways (or we have set an appropriate default via preseed).

After using this method since ages (literally) Mika noted that on the installed machines the debconf priority is still critical. This implies that apt-get installs did not asked us much. For example, MySQL/MariaDB is installed having a root user with empty password! Especially since the notion of critical seems to be a bit inconsistent between packages, this might be a serious pitfall.

Probably this didn’t had bad effects on our systems, but we have switched to priority medium (dpkg-reconfigure debconf).

WordPress 4.1

Finally, WordPress 4.1 with virtual host support is running on Debian Jessie with custom WP_CONTENT_DIR. As default, wp-content/themes is symlinked to /var/lib/wordpress/wp-content/themes/ which can be replaced if using custom themes.

Let’s see how long this lasts. WordPress is changing it’s folder structure and config options for upload stuff quite frequently.