Welcome to Gentoo Universe, an aggregation of weblog articles on all topics written by Gentoo developers. For a more refined aggregation of Gentoo-related topics only, you might be interested in Planet Gentoo.

Disclaimer:
Views expressed in the content published here do not necessarily represent the views of Gentoo Linux or the Gentoo Foundation.
   
September 17, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
Passwords, password managers, and family life (September 17, 2018, 05:04 UTC)

Somehow, I always end up spending time writing about passwords when I even breach the subject on Twitter.

In this case, I’ve been asking around about password managers, as after many years with LastPass I want to reconsider if there is a better alternative, particularly as my needs have changed (or rather, are going to, in the not too distant future).

One of the thing that I’m looking for is a password manager that can generate diceware/xkcd-style passwords: a set of words in a certain language that are easy to say on (say) the phone, and type on systems where there is no password manager app. The reason for this is that there are a few places in which I need to be able to give the password to someone else who might not otherwise be trusted with the full password list. For instance the WiFi password for my apartment, or my mother’s house.

But it’s a bit more complicated than that. There are a number of situations where an account is not just an user. Or rather, you may want to allow h multiple users (people) to access the same account. Say for instance my energy provider’s dashboard. Or the phone provider. Or the online grocery shopping…

All of these things expect a single (billing) account, but they may rather be shared with a household than with a single individual. A few services do have a concept of a shared account, but very few do, and that makes less and less sense as the world progresses to such an everything-connected level.

I think it might be easy to figure out from the way I’ve been expressing this just above, but just to make sure not to leave “clues” rather than clear information that can be obviously be taken for public knowledge, I got to think about this because I have (finally, someone might say) found a soulmate. And while we don’t yet live together, I start to see the rough corners of these. We have not gotten to “What’s the Netflix password, again?” but I did end up changing the password to the account for Los Angeles transport card, to give her access, after setting it first with LastPass (we were visiting, and I added both of our TAP cards to the same account).

As I made clear earlier, part of this was a (minor) problem with my mother, too. But significantly less so: she never cared to have access to the power provider, phone company, and so on. Just as long as she had a copy of the invoices from time to time (which I solved by having a mailing list, which only the two of us subscribe to, as the contact address for all the services I use or used for the household in Italy).

Service providers take note: integrating with Google Drive or Dropbox so that the invoices get automatically added to a shared folder would be a lovely feature to have. And not just for households. I would love if it was easier to just have a copy of my invoices automatically added to, and indexed by, Google Drive.

But now, with a partner, it’s different. As the word implies, it’s a partnership, an equal standing. Once we will move in, we’ll share the expenses, and that means sharing the access to the accounts. Which means I don’t want to be the only one having the passwords. So I need a password manager that not only allows me to share the passwords easily, but also that allows her to use the passwords easily — which likely will translate to be able to read them off the phone, and type in a work computer’s incognito window (because she likely won’t be allowed to install the password manager on a work computer).

Which is why I’m looking for a new password manager: LastPass is actually fairly great when it comes to sharing passwords with other accounts. But it’s effectively useless when it comes to “typeable” passwords. Their “Make pronounceable” option is okay to make it easier to spell out, but I don’t want to have to use an eight-letters password to be able to type it easily, when I could just as easily use a three-words combination that is significantly stronger.

And while I could just use xkcdpass on my laptop and generate those shared passwords (which is what I did with my mother’s router), that does not really scale (it still keeps me as the gatekeeper), and it does not make the security usability for my SO. And it wouldn’t be fair to keep the password hygiene for me only.

Similarly, any solution that involves running personal infrastructure (servers, cron, git, whatever) is not an option: not only I’m increasingly not relying on it myself (I even gave up on running my own blog’s webapp!), but most of my family is not even slightly interested in figuring out how to do that. And I don’t blame the least, they have enough of their own things to care about.

If you have any suggestions for a new password manager, please do let me know. I think I may try 1Password next, if nothing else because I think Troy Hunt’s opinion is worth something, and if he backed 1Password, there has to be a reason.

September 15, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

With Qt5 gaining support for high-DPI displays, and applications starting to exercise that support, it’s easy for applications to suddenly become unusable with some screens. For example, my old Samsung TV reported itself as 7″ screen. While this used not to really matter with websites forcing you to force the resolution of 96 DPI, the high-DPI applications started scaling themselves to occupy most of my screen, with elements becoming really huge (and ugly, apparently due to some poor scaling).

It turns out that it is really hard to find a solution for this. Most of the guides and tips are focused either on proprietary drivers or on getting custom resolutions. The DisplaySize specification in xorg.conf apparently did not change anything either. Finally, I was able to resolve the issue by overriding the EDID data for my screen. This guide explains how I did it.

Step 1: dump EDID data

Firstly, you need to get the EDID data from your monitor. Supposedly read-edid tool could be used for this purpose but it did not work for me. With only a little bit more effort, you can get it e.g. from xrandr:

$ xrandr --verbose
[...]
HDMI-0 connected primary 1920x1080+0+0 (0x57) normal (normal left inverted right x axis y axis) 708mm x 398mm
[...]
  EDID:
    00ffffffffffff004c2dfb0400000000
    2f120103804728780aee91a3544c9926
    0f5054bdef80714f8100814081809500
    950fb300a940023a801871382d40582c
    4500c48e2100001e662150b051001b30
    40703600c48e2100001e000000fd0018
    4b1a5117000a2020202020200000000a
    0053414d53554e470a20202020200143
    020323f14b901f041305140312202122
    2309070783010000e2000f67030c0010
    00b82d011d007251d01e206e285500c4
    8e2100001e011d00bc52d01e20b82855
    40c48e2100001e011d8018711c162058
    2c2500c48e2100009e011d80d0721c16
    20102c2580c48e2100009e0000000000
    00000000000000000000000000000029
[...]

If you have multiple displays connected, make sure to use the EDID for the one you’re overriding. Copy the hexdump and convert it to a binary blob. You can do this by passing it through xxd -p -r (installed by vim).

Step 2: fix screen dimensions

Once you have the EDID blob ready, you need to update the screen dimensions inside it. Initially, I did it using hex editor which involved finding all the occurrences, updating them (and manually encoding into the weird split-integers) and correcting the checksums. Then, I’ve written edid-fixdim so you wouldn’t have to repeat that experience.

First, use --get option to verify that your EDID is supported correctly:

$ edid-fixdim -g edid.bin
EDID structure: 71 cm x 40 cm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
CEA EDID found
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm

So your EDID consists of basic EDID structure, followed by one extension block. The screen dimensions are stored in 7 different blocks you’d have to update, and referenced in two checksums. The tool will take care of updating it all for you, so just pass the correct dimensions to --set:

$ edid-fixdim -s 1600x900 edid.bin
EDID structure updated to 160 cm x 90 cm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
CEA EDID found
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm

Afterwards, you can use --get again to verify that the changes were made correctly.

Step 3: overriding EDID data

Now it’s just the matter of putting the override in motion. First, make sure to enable CONFIG_DRM_LOAD_EDID_FIRMWARE in your kernel:

Device Drivers  --->
  Graphics support  --->
    Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)  --->
      [*] Allow to specify an EDID data set instead of probing for it

Then, determine the correct connector name. You can find it in dmesg output:

$ dmesg | grep -C 1 Connector
[   15.192088] [drm] ib test on ring 5 succeeded
[   15.193461] [drm] Radeon Display Connectors
[   15.193524] [drm] Connector 0:
[   15.193580] [drm]   HDMI-A-1
--
[   15.193800] [drm]     DFP1: INTERNAL_UNIPHY1
[   15.193857] [drm] Connector 1:
[   15.193911] [drm]   DVI-I-1
--
[   15.194210] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[   15.194267] [drm] Connector 2:
[   15.194322] [drm]   VGA-1

Copy the new EDID blob into location of your choice inside /lib/firmware:

$ mkdir /lib/firmware/edid
$ cp edid.bin /lib/firmware/edid/samsung.bin

Finally, add the override to your kernel command-line:

drm.edid_firmware=HDMI-A-1:edid/samsung.bin

If everything went fine, xrandr should report correct screen dimensions after next reboot, and dmesg should report that EDID override has been loaded:

$ dmesg | grep EDID
[   15.549063] [drm] Got external EDID base block and 1 extension from "edid/samsung.bin" for connector "HDMI-A-1"

If it didn't, check dmesg for error messages.

September 09, 2018
Sven Vermeulen a.k.a. swift (homepage, bugs)
cvechecker 3.9 released (September 09, 2018, 11:04 UTC)

Thanks to updates from Vignesh Jayaraman, Anton Hillebrand and Rolf Eike Beer, a new release of cvechecker is now made available.

This new release (v3.9) is a bugfix release.

September 08, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
SIP & STUN .. (September 08, 2018, 07:26 UTC)

Note to self .. it is not very useful when one leaves a (public) STUN server activated in a SIP client after changing it from using the VoIP Server's IP to the (internal) DNS .. leads to working signalling, but no audio ^^ - Took me a few days to figure out what had happened (including capturing stuff with Wireshark, ..)

September 07, 2018
Gentoo congratulates our GSoC participants (September 07, 2018, 00:00 UTC)

GSOC logo Gentoo would like to congratulate Gibix and JSteward for finishing and passing Google’s Summer of Code for the 2018 calendar year. Gibix contributed by enhancing Rust (programming language) support within Gentoo. JSteward contributed by making a full Gentoo GNU/Linux distribution, managed by Portage, run on devices which use the original Android-customized kernel.

The final reports of their projects can be reviewed on their personal blogs:

September 04, 2018
Domen Kožar a.k.a. domen (homepage, bugs)
Recent Cachix downtime (September 04, 2018, 09:00 UTC)

Cachix - Nix binary cache as a service was down:

  • On Aug 22nd from 16:55 until 18:55 UTC (120 minutes)
  • On Aug 23rd from 20:01 until 20:09 UTC (8 minutes)

On the 22nd there was no action from my side; the service recovered itself. I did have monitoring configured and I received email alerts, but I have not noticed them.

I have spent most of the 23rd gathering data and evidence on what went wrong. Just before monitoring stopped receiving data at 16:58 UTC, white-box system monitoring revealed:

  • Outgoing bandwidth skyrocketed to 23MB/s
  • Resident memory went through the roof to ~90%

On 23rd I have immediately seen the service was down and I've rebooted the machine.

I have spent a significant amount of time trying to determine if a specific request caused this, but it seems likely that it was just an overload, although I have not proved this theory.

Countermeasures taken

a) Server-side is implemented in GHC Haskell, so I have enabled -O2. Although GHC wiki on Performance says it is indistinguishable from -O1, in the last week I've seen an approximately 10% reduction of resident memory and most importantly, fewer memory spikes. Again, no hard evidence, time will tell.

b) Most importantly, production now runs with GHCRTS='-M2G' flag, limiting overall heap to 2G of memory, so we are not depending on the Linux OOM killer to handle out of memory situations. It is not entirely clear to me why the machine was unresponsive for two hours since OOM should have kicked in, but during that period there was not a single monitoring datapoint sent.

c) I have configured EKG to send GC stats to datadog so if it happens again, that should provide better insight into what is going on with memory consumption.

Countermeasures to be taken

1) Use a service like Pagerduty to be alerted immediately on the phone

2) Upgrade Datadog agent to version 6, which allows more precise per process monitoring

So far I am quite happy how Haskell works in production. I have taken Well-Typed training on GHC performance and if this turns out to be a space leak, I am confident that I will find it.

The only thing that saddens me, coming from Python, is that GHC has poor profiling options for long-running programs. Compiling GHC with profiling options significantly slows the performance. There is unmerged work making the GHC eventlog useful for such cases, but the state of this work is unclear.

Looking forward

So there it is, the first operational issue with Cachix. Despite this issue, I am happy to have made the choices that both allow me to respond quickly to the needs of Nix community, yet still allow me to further improve and stabilize the code with confidence as the product matures.

Speaking of maturing the product, I will share another announcement soon!

August 24, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

I have recently worked on enabling 2-step authentication via SSH on the Gentoo developer machine. I have selected google-authenticator-libpam amongst different available implementations as it seemed the best maintained and having all the necessary features, including a friendly tool for users to configure it. However, its design has a weakness: it stores the secret unprotected in user’s home directory.

This means that if an attacker manages to gain at least temporary access to the filesystem with user’s privileges — through a malicious process, vulnerability or simply because someone left the computer unattended for a minute — he can trivially read the secret and therefore clone the token source without leaving a trace. It would completely defeat the purpose of the second step, and the user may not even notice until the attacker makes real use of the stolen secret.

In order to protect against this, I’ve created google-authenticator-wrappers (as upstream decided to ignore the problem). This package provides a rather trivial setuid wrapper that manages a write-only, authentication-protected secret store for the PAM module. Additionally, it comes with a test program (so you can test the OTP setup without jumping through the hoops or risking losing access) and friendly wrappers for the default setup, as used on Gentoo Infra.

The recommended setup (as utilized by sys-auth/google-authenticator-wrappers package) is to use a dedicated user for the password store. In this scenario, the users are unable to read their secrets, and all secret operations (including authentication via the PAM module) are done using an unprivileged user. Furthermore, any operation regarding the configuration (either updating it or removing the second step) require regular PAM authentication (e.g. typing your own password).

This is consistent with e.g. how shadow operates (users can’t read their passwords, nor update them without authenticating first), how most sites using 2-factor authentication operate (again, users can’t read their secrets) and follows the RFC 6238 recommendation (that keys […] SHOULD be protected against unauthorized access and usage). It solves the aforementioned issue by preventing user-privileged processes from reading the secrets and recovery codes. Furthermore, it prevents the attacker with this particular level of access from disabling 2-step authentication, changing the secret or even weakening the configuration.

August 17, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)
Gentoo on Integricloud (August 17, 2018, 22:44 UTC)

Integricloud gave me access to their infrastructure to track some issues on ppc64 and ppc64le.

Since some of the issues are related to the compilers, I obviously installed Gentoo on it and in the process I started to fix some issues with catalyst to get a working install media, but that’s for another blogpost.

Today I’m just giving a walk-through on how to get a ppc64le (and ppc64 soon) VM up and running.

Preparation

Read this and get your install media available to your instance.

Install Media

I’m using the Gentoo installcd I’m currently refining.

Booting

You have to append console=hvc0 to your boot command, the boot process might figure it out for you on newer install medias (I still have to send patches to update livecd-tools)

Network configuration

You have to manually setup the network.
You can use ifconfig and route or ip as you like, refer to your instance setup for the parameters.

ifconfig enp0s0 ${ip}/16
route add -net default gw ${gw}
echo "nameserver 8.8.8.8" > /etc/resolv.conf
ip a add ${ip}/16 dev enp0s0
ip l set enp0s0 up
ip r add default via ${gw}
echo "nameserver 8.8.8.8" > /etc/resolv.conf

Disk Setup

OpenFirmware seems to like gpt much better:

parted /dev/sda mklabel gpt

You may use fdisk to create:
– a PowerPC PrEP boot partition of 8M
– root partition with the remaining space

Device     Start      End  Sectors Size Type
/dev/sda1   2048    18431    16384   8M PowerPC PReP boot
/dev/sda2  18432 33554654 33536223  16G Linux filesystem

I’m using btrfs and zstd-compress /usr/portage and /usr/src/.

mkfs.btrfs /dev/sda2

Initial setup

It is pretty much the usual.

mount /dev/sda2 /mnt/gentoo
cd /mnt/gentoo
wget https://dev.gentoo.org/~mattst88/ppc-stages/stage3-ppc64le-20180810.tar.xz
tar -xpf stage3-ppc64le-20180810.tar.xz
mount -o bind /dev dev
mount -t devpts devpts dev/pts
mount -t proc proc proc
mount -t sysfs sys sys
cp /etc/resolv.conf etc
chroot .

You just have to emerge grub and gentoo-sources, I diverge from the defconfig by making btrfs builtin.

My /etc/portage/make.conf:

CFLAGS="-O3 -mcpu=power9 -pipe"
# WARNING: Changing your CHOST is not something that should be done lightly.
# Please consult https://wiki.gentoo.org/wiki/Changing_the_CHOST_variable beforee
 changing.
CHOST="powerpc64le-unknown-linux-gnu"

# NOTE: This stage was built with the bindist Use flag enabled
PORTDIR="/usr/portage"
DISTDIR="/usr/portage/distfiles"
PKGDIR="/usr/portage/packages"

USE="ibm altivec vsx"

# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C
ACCEPT_KEYWORDS=~ppc64

MAKEOPTS="-j4 -l6"
EMERGE_DEFAULT_OPTS="--jobs 10 --load-average 6 "

My minimal set of packages I need before booting:

emerge grub gentoo-sources vim btrfs-progs openssh

NOTE: You want to emerge again openssh and make sure bindist is not in your USE.

Kernel & Bootloader

cd /usr/src/linux
make defconfig
make menuconfig # I want btrfs builtin so I can avoid a initrd
make -j 10 all && make install && make modules_install
grub-install /dev/sda1
grub-mkconfig -o /boot/grub/grub.cfg

NOTE: make sure you pass /dev/sda1 otherwise grub will happily assume OpenFirmware knows about btrfs and just point it to your directory.
That’s not the case unfortunately.

Networking

I’m using netifrc and I’m using the eth0-naming-convention.

touch /etc/udev/rules.d/80-net-name-slot.rules
ln -sf /etc/init.d/net.{lo,eth0}
echo -e "config_eth0=\"${ip}/16\"\nroutes_eth0="default via ${gw}\"\ndns_servers_eth0=\"8.8.8.8\"" > /etc/conf.d/net

Password and SSH

Even if the mticlient is quite nice, you would rather use ssh as much as you could.

passwd 
rc-update add sshd default

Finishing touches

Right now sysvinit does not add the hvc0 console as it should due to a profile quirk, for now check /etc/inittab and in case add:

echo 'hvc0:2345:respawn:/sbin/agetty -L 9600 hvc0' >> /etc/inittab

Add your user and add your ssh key and you are ready to use your new system!

August 15, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
new* helpers can read from stdin (August 15, 2018, 09:21 UTC)

Did you know that new* helpers can read from stdin? Well, now you know! So instead of writing to a temporary file you can install your inline text straight to the destination:

src_install() {
  # old code
  cat <<-EOF >"${T}"/mywrapper || die
    #!/bin/sh
    exec do-something --with-some-argument
  EOF
  dobin "${T}"/mywrapper

  # replacement
  newbin - mywrapper <<-EOF
    #!/bin/sh
    exec do-something --with-some-argument
  EOF
}

August 13, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

The recent efforts on improving the security of different areas of Gentoo have brought some arguments. Some time ago one of the developers has considered whether he would withstand physical violence if an attacker would use it in order to compromise Gentoo. A few days later another developer has suggested that an attacker could pay Gentoo developers to compromise the distribution. Is this a real threat to Gentoo? Are we all doomed?

Before I answer this question, let me make an important presumption. Gentoo is a community-driven open source project. As such, it has certain inherent weaknesses and there is no way around them without changing what Gentoo fundamentally is. Those weaknesses are common to all projects of the same nature.

Gentoo could indeed be compromised if developers are subject to the threat of violence to themselves or their families. As for money, I don’t want to insult anyone and I don’t think it really matters. The fact is, Gentoo is vulnerable to any adversary resourceful enough, and there are certainly both easier and cheaper ways than the two mentioned. For example, the adversary could get a new developer recruited, or simply trick one of the existing developers into compromising the distribution. It just takes one developer out of ~150.

As I said, there is no way around that without making major changes to the organizational structure of Gentoo. Those changes would probably do more harm to Gentoo than good. We can just admit that we can’t fully protect Gentoo from focused attack of a resourceful adversary, and all we can do is to limit the potential damage, detect it quickly and counteract the best we can. However, in reality random probes and script kiddie attacks that focus on trivial technical vulnerabilities are more likely, and that’s what the security efforts end up focusing on.

There seems to be some recurring confusion among Gentoo developers regarding the topic of OpenPGP key expiration dates. Some developers seem to believe them to be some kind of security measure — and start arguing about its weaknesses. Furthermore, some people seem to think of it as rotation mechanism, and believe that they are expected to generate new keys. The truth is, expiration date is neither of those.

The key expiration date can be updated at any time (both lengthened or shortened), including past the previous expiration date. This is a feature, not a bug. In fact, you are expected to update your expiration dates periodically. You certainly should not rotate your primary key unless really necessary, as switching to a new key usually involves a lot of hassle.

If an attacker manages to compromise your primary key, he can easily update the expiration date as well (even if it expires first). Therefore, expiration date does not really provide any added protection here. Revocation is the only way of dealing with compromised keys.

Expiration dates really serve two purposes: naturally eliminating unused keys, and enforcing periodical checks on the primary key. By requiring the developers to periodically update their expiration dates, we also implicitly force them to check whether their primary secret key (which we recommend storing offline, in a secure place) is still present and working. Now, if it turns out that the developer can’t neither update the expiration date nor revoke the key (because the key, its backups and the revocation certificate are all lost, damaged or the developer goes MIA), the key will eventually expire and stop being a ‘ghost’.

Even then, developers argue that we have LDAP and retirement procedures to deal with that. However, OpenPGP keys go beyond Gentoo and beyond Gentoo Infrastructure. We want to encourage good practices that will also affect our users and other people with whom developers are communicating, and who have no reason to know about internal Gentoo key management.

August 12, 2018

Pwnies logo

Congratulations to security researcher and Gentoo developer Hanno Böck and his co-authors Juraj Somorovsky and Craig Young for winning one of this year’s coveted Pwnie awards!

The award is for their work on the Return Of Bleichenbacher’s Oracle Threat or ROBOT vulnerability, which at the time of discovery affected such illustrious sites as Facebook and Paypal. Technical details can be found in the full paper published at the Cryptology ePrint Archive.

FroSCon logo

As last year, there will be a Gentoo booth again at the upcoming FrOSCon “Free and Open Source Conference” in St. Augustin near Bonn! Visitors can meet Gentoo developers to ask any question, get Gentoo swag, and prepare, configure, and compile their own Gentoo buttons.

The conference is 25th and 26th of August 2018, and there is no entry fee. See you there!

August 09, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
Inlining path_exists (August 09, 2018, 15:01 UTC)

The path_exists function in eutils was meant as a simple hack to check for existence of files matching a wildcard. However, it was kinda ugly and never became used widely. At this very moment, it is used correctly in three packages, semi-correctly in one package and totally misused in two packages. Therefore, I think it’s time to replace it with something nicer.

The replacement snippet is rather trivial (from the original consumer, eselect-opengl):

local shopt_saved=$(shopt -p nullglob)
shopt -s nullglob
local opengl_dirs=( "${EROOT%/}"/usr/lib*/opengl )
${shopt_saved}

if [[ -n ${opengl_dirs[@]} ]]; then
	# ...
fi

Through using nullglob, you disable the old POSIX default of leaving the wildcard unexpanded when it does not match anything. Instead, you either simply get an empty array or a list of matched files/directories. If your code requires at least one match, you check for the array being empty; if it handles empty argument lists just fine (e.g. for loops), you can avoid any conditionals. As a side effect, you get the expanded match in an array, so you don’t have to repeat the wildcard multiple times.

Also note using shopt directly instead of estack.eclass that is broken and does not restore options correctly. You can read more on option handling in Mangling shell options in ebuilds.

August 04, 2018
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
ptrace() and accidental boot fix on ia64 (August 04, 2018, 00:00 UTC)

trofi's blog: ptrace() and accidental boot fix on ia64

ptrace() and accidental boot fix on ia64

This story is another dive into linux kernel internals. It has started as a strace hangup on ia64 and ended up being an unusual case of gcc generating garbage code for linux kernel (not perfectly valid C either). I’ll try to cover a few ptrace() system call corners on x86_64 and ia64 for comparison.

Intro

I updated elilo and kernel on ia64 machine recently.

Kernel boot times shrunk from 10 minutes (kernel 3.14.14) down to 2 minutes (kernel 4.9.72). 3.14.14 kernel had large 8-minute pause when early console was not accessible. Every time this pause happened I thought I bricked the machine. And now delays are gone \o/

One new thing broke (so far): every time I ran strace it was hanging without any output printed. Mike Frysinger pointed out strace hangup likely related to gdb problems on ia64 reported before by Émeric Maschino.

And he was right!

Reproducing

Using ski image I booted fresh kernel to make sure the bug was still there:

# strace ls
<no response, hangup>

Yay! ski was able reproduce it: no need to torture physical machine while debugging. Next step was to find where strace got stuck. As strace and gdb are broken I had to resort to printf() debugging.

Before doing that I tried strace’s -d option to enable debug mode where it prints everything it expects from tracee process:

root@ia64 / # strace -d ls
strace: ptrace_setoptions = 0x51
strace: new tcb for pid 52, active tcbs:1
strace: [wait(0x80137f) = 52] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
strace: pid 52 has TCB_STARTUP, initializing it
strace: [wait(0x80057f) = 52] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
strace: [wait(0x00127f) = 52] WIFSTOPPED,sig=SIGCONT
strace: [wait(0x00857f) = 52] WIFSTOPPED,sig=133
????

Cryptic output. I tried to compare this output against correctly working x86_64 system to understand what went wrong:

amd64 $ strace -d ls
strace: ptrace_setoptions = 0x51
strace: new tcb for pid 29343, active tcbs:1
strace: [wait(0x80137f) = 29343] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
strace: pid 29343 has TCB_STARTUP, initializing it
strace: [wait(0x80057f) = 29343] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
strace: [wait(0x00127f) = 29343] WIFSTOPPED,sig=SIGCONT
strace: [wait(0x00857f) = 29343] WIFSTOPPED,sig=133
execve("/bin/ls", ["ls"], 0x60000fffffa4f1f8 /* 36 vars */strace: [wait(0x04057f) = 29343] WIFSTOPPED,sig=SIGTRAP,EVENT_EXEC (4)
strace: [wait(0x00857f) = 29343] WIFSTOPPED,sig=133
...

Up to execve call both logs are identical. Still no clue.

I spent some time looking at ptrace state machine in kernel and gave up trying to understand what was wrong. I then asked strace maintainer on what could be wrong and got an almost immediate response from Dmitry V. Levin: strace did not show actual error.

After a source code tweak he pointed at ptrace() syscall failure returning -EIO:

$ ./strace -d /
./strace: ptrace_setoptions = 0x51
./strace: new tcb for pid 11080, active tcbs:1
./strace: [wait(0x80137f) = 11080] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
./strace: pid 11080 has TCB_STARTUP, initializing it
./strace: [wait(0x80057f) = 11080] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
./strace: [wait(0x00127f) = 11080] WIFSTOPPED,sig=SIGCONT
./strace: [wait(0x00857f) = 11080] WIFSTOPPED,sig=133
./strace: get_regs: get_regs_error: Input/output error
????
...
"Looks like ptrace(PTRACE_GETREGS) always fails with EIO on this new kernel."

Now I got a more specific signal: ptrace(PTRACE_GETREGS,…) syscall failed.

Into the kernel

I felt I had finally found the smoking gun: getting registers of WIFSTOPPED tracee task should never fail. All registers must be already stored somewhere in memory.

Otherwise how would kernel be able to resume executing tracee task when needed?

Before diving into ia64 land let’s look into x86_64 ptrace(PTRACE_GETREGS, …) implementation.

x86_64 ptrace(PTRACE_GETREGS)

To find a <foo> syscall implementation in kernel we can search for sys_<foo>() function definition. The lazy way to find a definition is to interrogate built kernel with gdb:

$ gdb --quiet ./vmlinux
(gdb) list sys_ptrace
1105
1106    #ifndef arch_ptrace_attach
1107    #define arch_ptrace_attach(child)       do { } while (0)
1108    #endif
1109
1110    SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
1111                    unsigned long, data)
1112    {
1113            struct task_struct *child;
1114            long ret;

SYSCALL_DEFINE4(ptrace, …) macro defines actual sys_ptrace() which does a few sanity checks and dispatches to arch_ptrace():

SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
unsigned long, data)
{
// simplified a bit
struct task_struct *child;
long ret;
child = ptrace_get_task_struct(pid);
ret = arch_ptrace(child, request, addr, data);
return ret;
}

x86_64 implementation does copy_regset_to_user() call and takes a few lines of code to fetch registers:

long arch_ptrace(struct task_struct *child, long request,
unsigned long addr, unsigned long data) {
// ...
case PTRACE_GETREGS: /* Get all gp regs from the child. */
return copy_regset_to_user(child,
task_user_regset_view(current),
REGSET_GENERAL,
0, sizeof(struct user_regs_struct),
datap);

Let’s look at it in detail to get the idea where registers are normally stored.

static inline int copy_regset_to_user(struct task_struct *target,
const struct user_regset_view *view,
unsigned int setno,
unsigned int offset, unsigned int size,
void __user *data)
{
const struct user_regset *regset = &view->regsets[setno];
if (!regset->get)
return -EOPNOTSUPP;
if (!access_ok(VERIFY_WRITE, data, size))
return -EFAULT;
return regset->get(target, regset, offset, size, NULL, data);
}

Here copy_regset_to_user() is just a dispatcher to view argument. Moving on:

const struct user_regset_view *task_user_regset_view(struct task_struct *task)
{
// simplified #ifdef-ery
if (!user_64bit_mode(task_pt_regs(task)))
return &user_x86_32_view;
return &user_x86_64_view;
}
// ...
static const struct user_regset_view user_x86_64_view = {
.name = "x86_64", .e_machine = EM_X86_64,
.regsets = x86_64_regsets, .n = ARRAY_SIZE(x86_64_regsets)
};
// ...
static struct user_regset x86_64_regsets[] __ro_after_init = {
[REGSET_GENERAL] = {
.core_note_type = NT_PRSTATUS,
.n = sizeof(struct user_regs_struct) / sizeof(long),
.size = sizeof(long), .align = sizeof(long),
.get = genregs_get, .set = genregs_set
},
// ...

A bit of boilerplate to tie genregs_get() and genregs_set() to 64-bit (or 32-bit) caller. Let’s look at 64-bit variant of genregs_get() as it’s used in our PTRACE_GETREGS case:

static int genregs_get(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
if (kbuf) {
unsigned long *k = kbuf;
while (count >= sizeof(*k)) {
*k++ = getreg(target, pos);
count -= sizeof(*k);
pos += sizeof(*k);
}
} else {
unsigned long __user *u = ubuf;
while (count >= sizeof(*u)) {
if (__put_user(getreg(target, pos), u++))
return -EFAULT;
count -= sizeof(*u);
pos += sizeof(*u);
}
}
return 0;
}
// ...
static unsigned long getreg(struct task_struct *task, unsigned long offset)
{
// ... simplified
return *pt_regs_access(task_pt_regs(task), offset);
}
static unsigned long *pt_regs_access(struct pt_regs *regs, unsigned long regno)
{
BUILD_BUG_ON(offsetof(struct pt_regs, bx) != 0);
return &regs->bx + (regno >> 2);
}
// ..
#define task_pt_regs(task) \
({ \
unsigned long __ptr = (unsigned long)task_stack_page(task); \
__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
((struct pt_regs *)__ptr) - 1; \
})
static inline void *task_stack_page(const struct task_struct *task)
{
return task->stack;
}

From task_pt_regs() defnition we see that actual register contents is stored in task’s kernel stack. And genregs_get() copies register contents one by one in a while() loop.

How do task’s registers get stored to task’s kernel stack? There are a few paths to get there. Most frequent is perhaps interrupt handling when task is descheduled from CPU and is moved to scheduler wait queue.

ENTRY(interrupt_entry): is an entry point for interrupt handling.

ENTRY(interrupt_entry)
UNWIND_HINT_FUNC
ASM_CLAC
cld
testb $3, CS-ORIG_RAX+8(%rsp)
jz 1f
SWAPGS
/*
* Switch to the thread stack. The IRET frame and orig_ax are
* on the stack, as well as the return address. RDI..R12 are
* not (yet) on the stack and space has not (yet) been
* allocated for them.
*/
pushq %rdi
/* Need to switch before accessing the thread stack. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/*
* We have RDI, return address, and orig_ax on the stack on
* top of the IRET frame. That means offset=24
*/
UNWIND_HINT_IRET_REGS base=%rdi offset=24
pushq 7*8(%rdi) /* regs->ss */
pushq 6*8(%rdi) /* regs->rsp */
pushq 5*8(%rdi) /* regs->eflags */
pushq 4*8(%rdi) /* regs->cs */
pushq 3*8(%rdi) /* regs->ip */
pushq 2*8(%rdi) /* regs->orig_ax */
pushq 8(%rdi) /* return address */
UNWIND_HINT_FUNC
movq (%rdi), %rdi
1:
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
testb $3, CS+8(%rsp)
jz 1f
/*
* IRQ from user mode.
*
* We need to tell lockdep that IRQs are off. We can't do this until
* we fix gsbase, and we should do it before enter_from_user_mode
* (which can take locks). Since TRACE_IRQS_OFF is idempotent,
* the simplest way to handle it is to just call it twice if
* we enter from user mode. There's no reason to optimize this since
* TRACE_IRQS_OFF is a no-op if lockdep is off.
*/
TRACE_IRQS_OFF
CALL_enter_from_user_mode
1:
ENTER_IRQ_STACK old_rsp=%rdi save_ret=1
/* We entered an interrupt context - irqs are off: */
TRACE_IRQS_OFF
ret
END(interrupt_entry)
; ...
.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
/*
* Push registers and sanitize registers of values that a
* speculation attack might otherwise want to exploit. The
* lower registers are likely clobbered well before they
* could be put to use in a speculative execution gadget.
* Interleave XOR with PUSH for better uop scheduling:
*/
.if \save_ret
pushq %rsi /* pt_regs->si */
movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */
movq %rdi, 8(%rsp) /* pt_regs->di (overwriting original return address) */
.else
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
.endif
pushq \rdx /* pt_regs->dx */
xorl %edx, %edx /* nospec dx */
pushq %rcx /* pt_regs->cx */
xorl %ecx, %ecx /* nospec cx */
pushq \rax /* pt_regs->ax */
pushq %r8 /* pt_regs->r8 */
xorl %r8d, %r8d /* nospec r8 */
pushq %r9 /* pt_regs->r9 */
xorl %r9d, %r9d /* nospec r9 */
pushq %r10 /* pt_regs->r10 */
xorl %r10d, %r10d /* nospec r10 */
pushq %r11 /* pt_regs->r11 */
xorl %r11d, %r11d /* nospec r11*/
pushq %rbx /* pt_regs->rbx */
xorl %ebx, %ebx /* nospec rbx*/
pushq %rbp /* pt_regs->rbp */
xorl %ebp, %ebp /* nospec rbp*/
pushq %r12 /* pt_regs->r12 */
xorl %r12d, %r12d /* nospec r12*/
pushq %r13 /* pt_regs->r13 */
xorl %r13d, %r13d /* nospec r13*/
pushq %r14 /* pt_regs->r14 */
xorl %r14d, %r14d /* nospec r14*/
pushq %r15 /* pt_regs->r15 */
xorl %r15d, %r15d /* nospec r15*/
UNWIND_HINT_REGS
.if \save_ret
pushq %rsi /* return address on top of stack */
.endif
.endm

Interesting effects of the interrupt_entry are:

  • registers are backed up by PUSH_AND_CLEAR_REGS macro
  • memory area used for backup is PER_CPU_VAR(cpu_current_top_of_stack) (task’s kernel stack)

To recap: ptrace(PTRACE_GETREGS, …) does elementwise copy (using __put_user()) for each general register located in a single struct pt_regs in task’s kernel stack to tracer’s userspace.

Now let’s look at how ia64 does the same.

ia64 ptrace(PTRACE_GETREGS)

“Can’t be much more complicated than on x86_64” was my thought. Haha.

I started searching for -EIO failure in kernel and sprinkling printk() statements in ptrace() handling code.

ia64 begins with the same call path as x86_64:

Again, ptrace_getregs() is supposed to copy in-memory context back to caller’s userspace. Where did it return EIO?

Quiz: while you are skimming through the ptrace_getregs() code and comments right below, try to guess which EIO exit path is taken in our case. I’ve marked the cases with [N] numbers.

static long
ptrace_getregs (struct task_struct *child, struct pt_all_user_regs __user *ppr)
{
// ...
// [1] check if we can write back to userspace
if (!access_ok(VERIFY_WRITE, ppr, sizeof(struct pt_all_user_regs)))
return -EIO;
// [2] get pointer to register context (ok)
pt = task_pt_regs(child);
// [3] and tracee kernel stack (unexpected!)
sw = (struct switch_stack *) (child->thread.ksp + 16);
// [4] Try to unwind tracee's call chain (even more unexpected!)
unw_init_from_blocked_task(&info, child);
if (unw_unwind_to_user(&info) < 0) {
return -EIO;
}
// [5] validate alignment of target userspace buffer
if (((unsigned long) ppr & 0x7) != 0) {
dprintk("ptrace:unaligned register address %p\n", ppr);
return -EIO;
}
// [6] fetch special registers into local variables
if (access_uarea(child, PT_CR_IPSR, &psr, 0) < 0
|| access_uarea(child, PT_AR_EC, &ec, 0) < 0
|| access_uarea(child, PT_AR_LC, &lc, 0) < 0
|| access_uarea(child, PT_AR_RNAT, &rnat, 0) < 0
|| access_uarea(child, PT_AR_BSP, &bsp, 0) < 0
|| access_uarea(child, PT_CFM, &cfm, 0)
|| access_uarea(child, PT_NAT_BITS, &nat_bits, 0))
return -EIO;
/* control regs */
// [7] Finally start populating reguster contents into userspace:
retval |= __put_user(pt->cr_iip, &ppr->cr_iip);
retval |= __put_user(psr, &ppr->cr_ipsr);
/* app regs */
// [8] a few application registers
retval |= __put_user(pt->ar_pfs, &ppr->ar[PT_AUR_PFS]);
retval |= __put_user(pt->ar_rsc, &ppr->ar[PT_AUR_RSC]);
retval |= __put_user(pt->ar_bspstore, &ppr->ar[PT_AUR_BSPSTORE]);
retval |= __put_user(pt->ar_unat, &ppr->ar[PT_AUR_UNAT]);
retval |= __put_user(pt->ar_ccv, &ppr->ar[PT_AUR_CCV]);
retval |= __put_user(pt->ar_fpsr, &ppr->ar[PT_AUR_FPSR]);
retval |= __put_user(ec, &ppr->ar[PT_AUR_EC]);
retval |= __put_user(lc, &ppr->ar[PT_AUR_LC]);
retval |= __put_user(rnat, &ppr->ar[PT_AUR_RNAT]);
retval |= __put_user(bsp, &ppr->ar[PT_AUR_BSP]);
retval |= __put_user(cfm, &ppr->cfm);
/* gr1-gr3 */
// [9] normal (general) registers
retval |= __copy_to_user(&ppr->gr[1], &pt->r1, sizeof(long));
retval |= __copy_to_user(&ppr->gr[2], &pt->r2, sizeof(long) *2);
/* gr4-gr7 */
// [10] more normal (general) registers!
for (i = 4; i < 8; i++) {
if (unw_access_gr(&info, i, &val, &nat, 0) < 0)
return -EIO;
retval |= __put_user(val, &ppr->gr[i]);
}
/* gr8-gr11 */
// [11] even more normal (general) registers!!
retval |= __copy_to_user(&ppr->gr[8], &pt->r8, sizeof(long) * 4);
/* gr12-gr15 */
// [11] you've got the idea
retval |= __copy_to_user(&ppr->gr[12], &pt->r12, sizeof(long) * 2);
retval |= __copy_to_user(&ppr->gr[14], &pt->r14, sizeof(long));
retval |= __copy_to_user(&ppr->gr[15], &pt->r15, sizeof(long));
/* gr16-gr31 */
// [12] even more of those
retval |= __copy_to_user(&ppr->gr[16], &pt->r16, sizeof(long) * 16);
/* b0 */
// [13] branch register b0
retval |= __put_user(pt->b0, &ppr->br[0]);
/* b1-b5 */
// [13] more branch registers
for (i = 1; i < 6; i++) {
if (unw_access_br(&info, i, &val, 0) < 0)
return -EIO;
__put_user(val, &ppr->br[i]);
}
/* b6-b7 */
// [14] even more branch registers
retval |= __put_user(pt->b6, &ppr->br[6]);
retval |= __put_user(pt->b7, &ppr->br[7]);
/* fr2-fr5 */
// [15] floating point registers
for (i = 2; i < 6; i++) {
if (unw_get_fr(&info, i, &fpval) < 0)
return -EIO;
retval |= __copy_to_user(&ppr->fr[i], &fpval, sizeof (fpval));
}
/* fr6-fr11 */
// [16] more floating point registers
retval |= __copy_to_user(&ppr->fr[6], &pt->f6,
sizeof(struct ia64_fpreg) * 6);
/* fp scratch regs(12-15) */
// [17] more floating point registers
retval |= __copy_to_user(&ppr->fr[12], &sw->f12,
sizeof(struct ia64_fpreg) * 4);
/* fr16-fr31 */
// [18] even more floating point registers
for (i = 16; i < 32; i++) {
if (unw_get_fr(&info, i, &fpval) < 0)
return -EIO;
retval |= __copy_to_user(&ppr->fr[i], &fpval, sizeof (fpval));
}
/* fph */
// [19] rest of floating point registers
ia64_flush_fph(child);
retval |= __copy_to_user(&ppr->fr[32], &child->thread.fph,
sizeof(ppr->fr[32]) * 96);
/* preds */
// [20] predicate registers
retval |= __put_user(pt->pr, &ppr->pr);
/* nat bits */
// [20] NaT status registers
retval |= __put_user(nat_bits, &ppr->nat);
ret = retval ? -EIO : 0;
return ret;
}

It’s a huge function. Be afraid not! It has two main parts:

  • extraction of register values using unw_unwind_to_user()
  • copying extracted values to caller’s userspace using __put_user() and __copy_to_user() helpers.

Those two are a analogous of x86_64’s copy_regset_to_user() implementation.

Quiz answer: surprisingly it’s case [4]: EIO popped up due to a failure in unw_unwind_to_user() call. Or not so surprisingly given it’s The Function to fetch register values from somewhere.

Let’s check where register contents are hiding on ia64. Here goes unw_unwind_to_user() definition:

int
unw_unwind_to_user (struct unw_frame_info *info)
{
unsigned long ip, sp, pr = info->pr;
do {
unw_get_sp(info, &sp);
if ((long)((unsigned long)info->task + IA64_STK_OFFSET - sp)
< IA64_PT_REGS_SIZE) {
UNW_DPRINT(0, "unwind.%s: ran off the top of the kernel stack\n",
__func__);
break;
}
if (unw_is_intr_frame(info) &&
(pr & (1UL << PRED_USER_STACK)))
return 0;
if (unw_get_pr (info, &pr) < 0) {
unw_get_rp(info, &ip);
UNW_DPRINT(0, "unwind.%s: failed to read "
"predicate register (ip=0x%lx)\n",
__func__, ip);
return -1;
}
} while (unw_unwind(info) >= 0);
unw_get_ip(info, &ip);
UNW_DPRINT(0, "unwind.%s: failed to unwind to user-level (ip=0x%lx)\n",
__func__, ip);
return -1;
}
EXPORT_SYMBOL(unw_unwind_to_user);

The code above is more complicated than on x86_64. How is it supposed to work?

For efficiency reasons syscall interface (and even interrupt handling interface) on ia64 looks a lot more like normal function call. This means that linux does not store all general registers to a separate struct pt_regs backup area for each task switch.

Let’s peek at interrupt handling entry for completeness.

ia64 uses interrupt entrypoint to enter the kernel at ENTRY(interrupt):

ENTRY(interrupt)
/* interrupt handler has become too big to fit this area. */
br.sptk.many __interrupt
END(interrupt)
// ...
ENTRY(__interrupt)
DBG_FAULT(12)
mov r31=pr // prepare to save predicates
;;
SAVE_MIN_WITH_COVER // uses r31; defines r2 and r3
SSM_PSR_IC_AND_DEFAULT_BITS_AND_SRLZ_I(r3, r14)
// ensure everybody knows psr.ic is back on
adds r3=8,r2 // set up second base pointer for SAVE_REST
;;
SAVE_REST
;;
MCA_RECOVER_RANGE(interrupt)
alloc r14=ar.pfs,0,0,2,0 // must be first in an insn group
MOV_FROM_IVR(out0, r8) // pass cr.ivr as first arg
add out1=16,sp // pass pointer to pt_regs as second arg
;;
srlz.d // make sure we see the effect of cr.ivr
movl r14=ia64_leave_kernel
;;
mov rp=r14
br.call.sptk.many b6=ia64_handle_irq
END(__interrupt)

The code above handles interrupts as:

  • SAVE_MIN_WITH_COVER sets kernel stack (r12), gp (r1) and so on
  • SAVE_REST stores rest of registers r2 to r31 but leaves r32 to r127 be managed by RSE (register stack engine) like normal function call would.
  • Hands off control to C code in ia64_handle_irq.

All the above means that in order to get register r32 or similar we would need to perform stack kernel unwinding down to the userspace boundary and read register values from RSE memory area (backing store).

Into the rabbit hole

Back to our unwinder failure.

Our case is not very complicated as tracee is stopped at system call boundary and there is not too much to unwind. How one would know where user boundary starts? linux looks at return instruction pointer in every stack frame and checks if it’s return address still points to kernel address space.

Unwinding failure seemingly happens in depths of unw_unwind(info, &ip). From there find_save_locs(info); is called. find_save_locs() lazily builds or runs an unwind script. The run_script() is a small bytecode interpter of 11 instruction types.

If the above does not make sense to you it’s fine. It did not make sense to me either.

To get more information from unwinder I enabled debugging output for unwinder by adding #define UNW_DEBUG:

--- a/arch/ia64/kernel/unwind.c
+++ b/arch/ia64/kernel/unwind.c
@@ -56,4 +56,6 @@
#define UNW_STATS 0 /* WARNING: this disabled interrupts for long time-spans!! */
+#define UNW_DEBUG 1
+
#ifdef UNW_DEBUG
static unsigned int unw_debug_level = UNW_DEBUG;

I ran strace again:

ia64 # strace -v -d ls
strace: ptrace_setoptions = 0x51
unwind.build_script: no unwind info for ip=0xa00000010001c1a0 (prev ip=0x0)
unwind.run_script: no state->pt, dst=18, val=136
unwind.unw_unwind: failed to locate return link (ip=0xa00000010001c1a0)!
unwind.unw_unwind_to_user: failed to unwind to user-level (ip=0xa00000010001c1a0)

build_script() couldn’t resolve current ip=0xa00000010001c1a0 address. Why? No idea! I added printk() around the place where I expected a match:

--- a/arch/ia64/kernel/unwind.c
+++ b/arch/ia64/kernel/unwind.c
@@ -1562,6 +1564,8 @@ build_script (struct unw_frame_info *info)
prev = NULL;
for (table = unw.tables; table; table = table->next) {
+ UNW_DPRINT(0, "unwind.%s: looking up ip=%#lx in [start=%#lx,end=%#lx)\n",
+ __func__, ip, table->start, table->end);
if (ip >= table->start && ip < table->end) {
/*
* Leave the kernel unwind table at the very front,

I ran strace again:

ia64 # strace -v -d ls
strace: ptrace_setoptions = 0x51
unwind.build_script: looking up ip=0xa00000010001c1a0 in [start=0xa000000100009240,end=0xa000000100000000)
unwind.build_script: looking up ip=0xa00000010001c1a0 in [start=0xa000000000040720,end=0xa000000000040ad0)
unwind.build_script: no unwind info for ip=0xa00000010001c1a0 (prev ip=0x0)

Can you spot the problem? Look at this range: [start=0xa000000100009240,end=0xa000000100000000). It’s end is less than start. This renders table->start && ip < table->end condition to be always false. How could it happen?

It means the ptrace() itself is not at fault here but a victim of already corrupted table->end value.

Going deeper

To find table->end corruption I checked if table was populated correctly. It is done by a simple function init_unwind_table():

static void
init_unwind_table (struct unw_table *table, const char *name, unsigned long segment_base,
unsigned long gp, const void *table_start, const void *table_end)
{
const struct unw_table_entry *start = table_start, *end = table_end;
table->name = name;
table->segment_base = segment_base;
table->gp = gp;
table->start = segment_base + start[0].start_offset;
table->end = segment_base + end[-1].end_offset;
table->array = start;
table->length = end - start;
}

Table construction happens in only a few places:

void __init
unw_init (void)
{
extern char __gp[];
extern char __start_unwind[], __end_unwind[];
...
// Kernel's own unwind table
init_unwind_table(&unw.kernel_table, "kernel", KERNEL_START, (unsigned long) __gp,
__start_unwind, __end_unwind);
}
// ...
void *
unw_add_unwind_table (const char *name, unsigned long segment_base, unsigned long gp,
const void *table_start, const void *table_end)
{
// ...
init_unwind_table(table, name, segment_base, gp, table_start, table_end);
}
// ...
static int __init
create_gate_table (void)
{
// ...
unw_add_unwind_table("linux-gate.so", segbase, 0, start, end);
}
// ...
static void
register_unwind_table (struct module *mod)
{
// ...
mod->arch.core_unw_table = unw_add_unwind_table(mod->name, 0, mod->arch.gp,
core, core + num_core);
mod->arch.init_unw_table = unw_add_unwind_table(mod->name, 0, mod->arch.gp,
init, init + num_init);
}

Here we see unwind tables created for:

  • one table for kernel itself
  • one table linux-gate.so (equivalent of linux-vdso.so.1 on x86_64)
  • one table for each kernel module

Arrays are hard

Nothing complicated, right? Actually gcc fails to generate correct code for end[-1].end_offset expression! It happens to be a rare corner case:

Both __start_unwind and __end_unwind are defined in linker script as external symbols:

# somewhere in arch/ia64/kernel/vmlinux.lds.S
# ...
SECTIONS {
    # ...
    .IA_64.unwind : AT(ADDR(.IA_64.unwind) - LOAD_OFFSET) {
            __start_unwind = .;
            *(.IA_64.unwind*)
            __end_unwind = .;
    } :code :unwind
    # ...

Here is how C code defines __end_unwind:

extern char __end_unwind[];

If we manually inline all the above into unw_init we will get the following:

void __init
unw_init (void)
{
extern char __end_unwind[];
...
table->end = segment_base + ((unw_table_entry *)__end_unwind)[-1].end_offset;
}

If __end_unwind[] would be an array defined in C then negative index -1 would cause undefned behaviour.

On the practical side it’s just pointer arithmetics. Is there anything special about subtracting a few bytes from an arbitrary address and then dereference it?

Let’s check what kind of assembly gcc actually generates.

Compiler mysteries

Still reading? Great! You got to most exciting part of this article!

Let’s look at simpler code first. And then we will grow it to be closer to our initial example.

Let’s start from global array with a negative index:

extern long __some_table[];
long end(void) { return __some_table[-1]; }

Compilation result (I’ll strip irrelevant bits and annotations):

; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
addl r14 = @ltoffx(__some_table#), r1
;;
ld8.mov r14 = [r14], __some_table#
;;
adds r14 = -8, r14
;;
ld8 r8 = [r14]
br.ret.sptk.many b0
.endp end#

Here two things happen:

  • __some_table address is read from GOT (r1 is rougly GOT register) by performing an ld8.mov (a form of 8-byte load) into r14.
  • final value is loaded at address r14 - 18 using ld8 (also a 8-byte load).

Simple!

We can simplify the example by avoiding GOT indirection. The typical way to do it is to use __attribute__((visibility(“hidden”))) hint:

extern long __some_table[] __attribute__((visibility("hidden")));
long end(void) { return __some_table[-1]; }

Assembly code:

; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
movl r14 = @gprel(__some_table#)
;;
add r14 = r1, r14
;;
adds r14 = -8, r14
;;
ld8 r8 = [r14]
br.ret.sptk.many b0

Here movl r14 = @gprel(__some_table#) is a link-time 64-bit constant: an offset of __some_table array from r1 value. Only a single 8-byte load happens at address @gprel(__some_table#) + r1 - 8.

Also straightforward.

Now let’s change the alignment of our table from long (8 bytes on ia64) to char (1 byte):

extern char __some_table[] __attribute__((visibility("hidden")));
long end(void) { return ((long*)__some_table)[-1]; }
; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
movl r14 = @gprel(__some_table#)
;;
add r14 = r1, r14
;;
adds r19 = -7, r14
adds r16 = -8, r14
adds r18 = -6, r14
adds r17 = -5, r14
adds r21 = -4, r14
adds r15 = -3, r14
;;
ld1 r19 = [r19]
adds r20 = -2, r14
adds r14 = -1, r14
ld1 r16 = [r16]
;;
ld1 r18 = [r18]
shl r19 = r19, 8
ld1 r17 = [r17]
;;
or r19 = r16, r19
shl r18 = r18, 16
ld1 r16 = [r21]
ld1 r15 = [r15]
shl r17 = r17, 24
;;
or r18 = r19, r18
shl r16 = r16, 32
ld1 r8 = [r20]
ld1 r19 = [r14]
shl r15 = r15, 40
;;
or r17 = r18, r17
shl r14 = r8, 48
shl r8 = r19, 56
;;
or r16 = r17, r16
;;
or r15 = r16, r15
;;
.mmi
or r14 = r15, r14
;;
or r8 = r14, r8
br.ret.sptk.many b0
.endp end#

This is quite a blowup in code size! Here instead of one 8-byte ld8 load compiler generated 8 1-byte ld1 loads to assemble valid value with the help of arithmetic shifts and ors.

Note how each individual byte gets it’s personal register to keep an address and result of the load.

Here is the subset of above instructions to handle byte offset -5:

; point r14 at __some_table:
movl r14 = @gprel(__some_table#)
add r14 = r1, r14
;
; read one byte and shift it
; into destination byte position:
;
adds r17 = -5, r14
ld1 r17 = [r17]
shl r17 = r17, 24
or r16 = r17, r16

This code, while ugly and inefficient, is still correct.

Now let’s wrap our 8-byte value in a struct to make example closer to original unwinder’s table registration code:

extern char __some_table[] __attribute__((visibility("hidden")));
struct s { long v; };
long end(void) { return ((struct s *)__some_table)[-1].v; }

Quiz time: do you think generated code will be exactly the same as in previous example or somehow different?

; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
movl r14 = @gprel(__some_table#)
movl r16 = 0x1ffffffffffffff9
;;
add r14 = r1, r14
movl r15 = 0x1ffffffffffffff8
movl r17 = 0x1ffffffffffffffa
;;
add r15 = r14, r15
add r17 = r14, r17
add r16 = r14, r16
;;
ld1 r8 = [r15]
ld1 r16 = [r16]
;;
ld1 r15 = [r17]
movl r17 = 0x1ffffffffffffffb
shl r16 = r16, 8
;;
add r17 = r14, r17
or r16 = r8, r16
shl r15 = r15, 16
;;
ld1 r8 = [r17]
movl r17 = 0x1ffffffffffffffc
or r15 = r16, r15
;;
add r17 = r14, r17
shl r8 = r8, 24
;;
ld1 r16 = [r17]
movl r17 = 0x1ffffffffffffffd
or r8 = r15, r8
;;
add r17 = r14, r17
shl r16 = r16, 32
;;
ld1 r15 = [r17]
movl r17 = 0x1ffffffffffffffe
or r16 = r8, r16
;;
add r17 = r14, r17
shl r15 = r15, 40
;;
ld1 r8 = [r17]
movl r17 = 0x1fffffffffffffff
or r15 = r16, r15
;;
add r14 = r14, r17
shl r8 = r8, 48
;;
ld1 r16 = [r14]
or r15 = r15, r8
;;
shl r8 = r16, 56
;;
or r8 = r15, r8
br.ret.sptk.many b0
.endp end#

The code is different from previous one! Seemingly not too much but there one suspicious detail: offsets now are very large. Let’s look at our -5 example again:

; point r14 at __some_table:
movl r14 = @gprel(__some_table#)
add r14 = r1, r14
;
; read one byte and shift it
; into destination byte position:
;
movl r17 = 0x1ffffffffffffffb
add r17 = r14, r17
ld1 r8 = [r17]
shl r8 = r8, 24
or r8 = r15, r8
; ...

The offset 0x1ffffffffffffffb (2305843009213693947) used here is incorrect. It should have been 0xfffffffffffffffb (-5).

We encounter (arguably) a compiler bug known as PR84184. Upstream says struct handling is different enough from direct array dereferences to trick gcc into generating incorrect byte offsets.

One day I’ll take a closer look at it to understand mechanics.

Let’s explore one more example: what if we add bigger alignment to __some_table without changing it’s type?

extern char __some_table[] __attribute__((visibility("hidden"))) __attribute((aligned(8)));
struct s { long v; };
long end(void) { return ((struct s *)__some_table)[-1].v; }
; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
movl r14 = @gprel(__some_table#)
;;
add r14 = r1, r14
;;
adds r14 = -8, r14
;;
ld8 r8 = [r14]
br.ret.sptk.many b0

Exactly as our original clean and fast example: single aligned load at offset -8.

Now we have a simple workaround!

What if we pass our array in a register instead of using a global reference? (effectively uninlining array address)

struct s { long v; };
long end(char * __some_table) { return ((struct s *)__some_table)[-1].v; }
; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
adds r32 = -8, r32
;;
ld8 r8 = [r32]
br.ret.sptk.many b0

Also works! Note how compiler promotes alignment after a type cast from 1 to 8.

In this case a few things happen at the same time to trigger bad code generation:

  • gcc infers that char __end_unwind[] is an array literal with alignment 1
  • gcc inlines __end_unwind into init_unwind_table and demotes alignment from 8 (const struct unw_table_entry) to 1 (extern char [])
  • gcc assumes that __end_unwind can’t have negative subscript and generates invalid (and inefficient) code

Workarounds (aka hacks) time!

We can workaround corner-case conditions above in a few different ways:

Fix is still not perfect as negative subscript it used. But at least the load is aligned.

Note that void __init unw_init() is called early in kernel startup sequence even before console is initialized.

This code generation bug causes either garbage read from some memory location or kernel crash trying to access unmapped memory.

That is the strace breakage mechanics.

Parting words

  • Task switch on x86_64 and on ia64 is fun :)
  • On x86_64 implementation of ptrace(PTRACE_GETREGS, …) is very straightforward: almost a memcpy from predefined location.
  • On ia64 ptrace(PTRACE_GETREGS, …) requires many moving parts:
    • call stack unwinder for kernel (involving linker scripts to define __end_unwind and __start_unwind)
    • bytecode generator and bytecode interpreter to speedup unwinding for every ptrace() call
  • Unaligned load of register-sized value is a tricky and fragile business

Have fun!

Posted on August 4, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

August 03, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
Verifying repo/gentoo.git with gverify (August 03, 2018, 12:04 UTC)

Git commit signatures are recursive by design — that is, each signature covers not only the commit in question but also indirectly all past commits, via tree and parent commit hashes. This makes user-side commit verification much simpler, as the user needs only to verify the signature on the most recent commit; with the assumption that the developer making it has verified the earlier commit and so on. Sadly, this is usually not the case at the moment.

Most of the Gentoo developers do not really verify the base upon which they are making their commits. While they might verify the commits when pulling before starting to work on their changes, it is rather unlikely that they verify the correctness when they repeatedly need to rebase before pushing. Usually this does not cause problems as Gentoo Infrastructure is verifying the commit signatures before accepting the push. Nevertheless, the recent attack on our GitHub mirrors made me realize that if a smart attacker was able to inject a single malicious commit without valid signature, then a Gentoo developer would most likely make a signed commit on top of it without even noticing the problem.

In this article, I would like to shortly present my quick solution to this problem — app-portage/gverify. gverify is a trivial reimplementation of gkeys in <200 lines of code. It uses the gkeys seed data (yes, this means it relies on manual updates) combined with autogenerated developer keyrings to provide strict verification of commits. Unlike gkeys, it works out-of-the-box without root privileges and automatically updates the keys on use.

The package installs a gv-install tool that installs two hooks on your repo/gentoo.git working copy. Those are post-merge and pre-rebase hooks that verify the tip of upstream master branch, respectively every time merge on master is finished, and every time a rebase is about to be started. This covers the two main cases — git pull and git pull --rebase. The former causes a verbose error after the update, the latter prevents a rebase from proceeding.

While this is far from perfect, it seems reasonably good solution given the limitations of available git hooks. Most importantly, it should prevent the git pull --rebase -S && git push --sign loop from silently accepting a malicious commit. Currently the hook verifies the top upstream commit only; however, in the future I want to implement incremental verification of all new commits.

July 30, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
FreeStyle Libre and first responders (July 30, 2018, 18:04 UTC)

Over on Twitter, a friend asked me a question related to the FreeStyle Libre, since he knew that I’m an user. I provided some “soundbite-shaped” answers on the thread but since I got a few more confused replies afterwards, I thought I would try to make the answer a bit more complete:

Let’s start with a long list of caveats here: I’m not a doctor, I’m not a paramedic, I do not work for or with Abbott, and I don’t speak for my employer. All the opinions that follow are substantiated only by my personal experiences and expertise, which is to say, I’m an user of the Libre system and I happen to be a former firmware engineer (in non-medical fields) and have a hobby of reverse engineering glucometer communication protocols. I will also point out that I have explicitly not looked deeply into the NFC part of the communication protocol, because (as I’ll explain in a minute), that crosses the line of what I feel comfortable releasing to the public.

Let me start with the immediate question that Ciarán asks in the tweet. No, the communication between the sensor and the reader device (or phone app) is not authenticated or protected by a challenge/response pair, as far as I know. From what I’ve been told (yes I’m talking through hearsay here, but give me a moment), the sensor will provide the response no matter who is asking. But the problem is what that response represent.

Unlike your average test strip based glucometer, the sensor does not record actual blood glucose numbers. Instead it reports a timeseries of raw values from different sensors. Pierre Vandevenne looked at the full response and shed some light onto the various other values provided by the sensor.

How that data is interpreted by the reader (or app) depends on its calibration, which happens in the first 60 minutes of operation of the sensor. Because of this, the official tools (reader and app) only allows you to scan a sensor with the tool that started it — special concessions are made for the app: a sensor started by a reader device can be also “tied” to the app, as long as you scan it with the app during the first hour of operation. It does not work the other way, so if you initialize with the app, you can’t use the reader.

While I cannot be certain that the reader/app doesn’t provide data to the sensor to allow you to do this kind of dual-initialization, my guess is that they don’t: the launch of the app was not tied with any change to the sensors, nor with warnings that only sensors coming from a certain lot and later models would work. Also, the app is “aware” of sensors primed by the reader, but not vice-versa, which suggests the reader’s firmware just wouldn’t allow you to scan an already primed sensor.

Here is one tidbit of information I’ll go back to later on. To use the app, you need to sign up for an account, and all the data from the sensor is uploaded to FreeStyle’s servers. The calibration data appears to be among the information shared on the account, which allows you to move the app you use to a new phone without waiting to replace the sensor. This is very important, because you don’t want to throw away your sensor if you break your phone.

The calibration data is then used together with non-disclosed algorithms (also called “curves” in various blogs) to produce the blood glucose equivalent value shown to the user. One important note here is that the reader and the app do not always agree on the value. While I cannot tell for sure what’s going on, my guess is that, as the reader’s firmware is not modifiable, the app contains newer version of the algorithms, and maybe a newer reader device would agree with the app. As I have decided not to focus on reversing the firmware of the reader, I have no answer there.

Can you get answers from the sensor without the calibration data? As I’m not sure what that data is, I can’t give a definite answer, but I will note that there are a number of unofficial apps out there that purport of doing exactly that. These are the same apps that I have, personally, a big problem with, as they provide zero guarantee that their results are at all precise or consistent, and scare the crap out of me, if you plan on making your life and health depend on them. Would the paramedics be able to use one of those apps to provide vague readings off a sensor? Possibly. But let me continue.

The original tweet by Eoghan asks Abbott if it would be possible for paramedics to have a special app to be able to read the sensor. And here is where things get complicated. Because yes, Abbott could provide such an app, as long as the sensor was initialized or calibration-scanned by the app within the calibration hour: their servers have the calibration data, which is needed to move the app between phones without losing data and without waiting for a new sensor.

But even admitting that there is no technical showstopper to such an app, there are many more ethical and legal concerns about it. There’s no way that the calibration data, and even the immediate value, wouldn’t be considered Sensitive Personal Data. This means for Abbott to be able to share it with paramedics, they would have to have a sharing agreement in place, with all the requirements that the GDPR impose them (for good reason).

Adding to this discussion, there’s the question of whether it would actually be valuable to paramedics to have this kind of information. Since I have zero training in the field, I can’t answer for sure, but I would be cautious about trusting the reading of the sensor, particularly if paramedics had to be involved.

The first warning comes from Abbott themselves, that recommend using blood-based test strips to confirm blood sugar readings during rapid glucose changes (in both directions). Since I’m neither trained in chemistry nor medicine, I don’t know why that is the case, but I have read tidbits that it has to do with the fact that the sensor reads values from interstitial fluid, rather than plasma, and the algorithms are meant to correlate the two values. Interstitial fluid measurements can lag behind the plasma ones and thus while the extrapolation can be correct for a smooth change, it might be off (very much so) when they change suddenly.

And as a personal tale, I have experienced the Libre not reporting any data, and then reporting very off values, after spending a couple of hours in very cold environment (in Pittsburgh, at -14°C). Again, see Vandevenne’s blog for what’s going on there with temperatures and thermal compensation.

All in all, I think that I would trust better a single fingerprick to get a normal test-strip result, both because it works universally, whether you do have a sensor or not, and because its limitations are much better understood both by their users and the professionals. And they don’t need to have so many ethical and legal implications to use.

July 24, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

So I have a Django DurationField in my model, and needed to format this as HH:mm .. unfortunately django doesn't seem to support that out of the box.. after considering templatetags or writing my own filter I decided to go for a very simple alternative and just defined a method for this in my model:

    timeslot_duration = models.DurationField(null=False,
                                             blank=False,
                                             default='00:05:00',
                                             verbose_name=_('timeslot_duration'),
                                             help_text=_('[DD] [HH:[MM:]]ss[.uuuuuu] format')
                                             )

    def timeslot_duration_HHmm(self):
        sec = self.timeslot_duration.total_seconds()
        return '%02d:%02d' % (int((sec/3600)%3600), int((sec/60)%60))

that way I can do whatever I want format-wise to get exactly what I need. Not sure if this is recommended practice, or maybe frowned upon, but it works just fine.

and in my template then just use {{ <model>.timeslot_duration_HHmm }} instead of {{ <model>.timeslot_duration }}.

July 19, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)

This quick article is a wrap up for reference on how to connect to ScyllaDB using Spark 2 when authentication and SSL are enforced for the clients on the Scylla cluster.

We encountered multiple problems, even more since we distribute our workload using a YARN cluster so that our worker nodes should have everything they need to connect properly to Scylla.

We found very little help online so I hope it will serve anyone facing similar issues (that’s also why I copy/pasted them here).

The authentication part is easy going by itself and was not the source of our problems, SSL on the client side was.

Environment

  • (py)spark: 2.1.0.cloudera2
  • spark-cassandra-connector: datastax:spark-cassandra-connector: 2.0.1-s_2.11
  • python: 3.5.5
  • java: 1.8.0_144
  • scylladb: 2.1.5

SSL cipher setup

The Datastax spark cassandra driver uses default the TLS_RSA_WITH_AES_256_CBC_SHA cipher that the JVM does not support by default. This raises the following error when connecting to Scylla:

18/07/18 13:13:41 WARN channel.ChannelInitializer: Failed to initialize a channel. Closing: [id: 0x8d6f78a7]
java.lang.IllegalArgumentException: Cannot support TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers

According to the ssl documentation we have two ciphers available:

  1. TLS_RSA_WITH_AES_256_CBC_SHA
  2. TLS_RSA_WITH_AES_128_CBC_SHA

We can get get rid of the error by lowering the cipher to TLS_RSA_WITH_AES_128_CBC_SHA using the following configuration:

.config("spark.cassandra.connection.ssl.enabledAlgorithms", "TLS_RSA_WITH_AES_128_CBC_SHA")\

However, this is not really a good solution and instead we’d be inclined to use the TLS_RSA_WITH_AES_256_CBC_SHA version. For this we need to follow this Datastax’s procedure.

Then we need to deploy the JCE security jars on our all client nodes, if using YARN like us this means that you have to deploy these jars to all your NodeManager nodes.

For example by hand:

# unzip jce_policy-8.zip
# cp UnlimitedJCEPolicyJDK8/*.jar /opt/oracle-jdk-bin-1.8.0.144/jre/lib/security/

Java trust store

When connecting, the clients need to be able to validate the Scylla cluster’s self-signed CA. This is done by setting up a trustStore JKS file and providing it to the spark connector configuration (note that you protect this file with a password).

keyStore vs trustStore

In SSL handshake purpose of trustStore is to verify credentials and purpose of keyStore is to provide credentials. keyStore in Java stores private key and certificates corresponding to the public keys and is required if you are a SSL Server or SSL requires client authentication. TrustStore stores certificates from third parties or your own self-signed certificates, your application identify and validates them using this trustStore.

The spark-cassandra-connector documentation has two options to handle keyStore and trustStore.

When we did not use the trustStore option, we would get some obscure error when connecting to Scylla:

com.datastax.driver.core.exceptions.TransportException: [node/1.1.1.1:9042] Channel has been closed

When enabling DEBUG logging, we get a clearer error which indicated a failure in validating the SSL certificate provided by the Scylla server node:

Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

setting up the trustStore JKS

You need to have the self-signed CA public certificate file, then issue the following command:

# keytool -importcert -file /usr/local/share/ca-certificates/MY_SELF_SIGNED_CA.crt -keystore COMPANY_TRUSTSTORE.jks -noprompt
Enter keystore password:  
Re-enter new password: 
Certificate was added to keystore

using the trustStore

Now you need to configure spark to use the trustStore like this:

.config("spark.cassandra.connection.ssl.trustStore.password", "PASSWORD")\
.config("spark.cassandra.connection.ssl.trustStore.path", "COMPANY_TRUSTSTORE.jks")\

Spark SSL configuration example

This wraps up the SSL connection configuration used for spark.

This example uses pyspark2 and reads a table in Scylla from a YARN cluster:

$ pyspark2 --packages datastax:spark-cassandra-connector:2.0.1-s_2.11 --files COMPANY_TRUSTSTORE.jks

>>> spark = SparkSession.builder.appName("scylla_app")\
.config("spark.cassandra.auth.password", "test")\
.config("spark.cassandra.auth.username", "test")\
.config("spark.cassandra.connection.host", "node1,node2,node3")\
.config("spark.cassandra.connection.ssl.clientAuth.enabled", True)\
.config("spark.cassandra.connection.ssl.enabled", True)\
.config("spark.cassandra.connection.ssl.trustStore.password", "PASSWORD")\
.config("spark.cassandra.connection.ssl.trustStore.path", "COMPANY_TRUSTSTORE.jks")\
.config("spark.cassandra.input.split.size_in_mb", 1)\
.config("spark.yarn.queue", "scylla_queue").getOrCreate()

>>> df = spark.read.format("org.apache.spark.sql.cassandra").options(table="my_table", keyspace="test").load()
>>> df.show()

July 15, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

When playing with the thought of adding images to my books DB i thought: I need random names, and would like to scale them ..

So I looked a bit and found django-stdimage. I was pretty happy with what it could do, but the uuid4 names themselves seemed a bit .. not what I wanted .. So I came up with adding the objects pk into the filename as well.

There were some nice ways already to generate filenames, but none exactly what I wanted.

Here is my own class UploadToClassNameDirPKUUID:

class UploadToClassNameDirPKUUID(UploadToClassNameDir):
    def __call__(self, instance, filename):
        # slightly modified from the UploadToUUId class from stdimage.utils
        if instance.pk:
            self.kwargs.update({
                'name': '{}-{}'.format(instance.pk, uuid4().hex),
                })
        else:
            # no pk found so just get uuid4.hex
            self.kwargs.update({
                'name': uuid4().hex
                })
        return super().__call__(instance, filename)

Basically the same as UploadToClassNameDirUUID, but with instance.pk added in the front of the filename - this is purely convenience for me so I have the 2 pictures for my book (front&back) identifiable in the directory without looking both up in my DB. One could maybe argue it would "expose" the pk, but first in this case I do not really care as the app is not public and 2nd: anyone who can access my django-admin (which is what I use for data entry,..) would see the pk anyway so whatever ;)

July 14, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

Since I wanted to make an inventory app (that i will likely post the source of - as FOSS of course - at some point when I am done) I wanted to have a model for languages with their ISO 639-1 code.

Now the model itself is of course easy, but where to get the data to popluate it.. I was certainly not going to do that manually. After a bit of searching and talking to people on IRC I dug a bit aorund the django I18N / L10N code and found something I could use: django.conf.locale.LANG_INFO While this is without a doubt used interanally for django, I thought that would be awesome to just use that as  base for my data.

The next point was how to get the data into my DB without too much effort, but reproduceable. The first thing that came to mind was to write my own migration and populate it from there. Not somethign I particularily liked since I have a tendency to wipe my migrations and start from scratch during development and I was sure i'd delete just that one-too-many.

The other - and in my opinion better- option I found was more flexible as to when it was run and also beautifully simple: just write my own Custom Management command to do the data import for me. Using the Django documentation on custom management commands as base I got this working very quickly. Enough rambling .. here's the code:

first the model (since it was in the source data i added name_local cuz it is probably useful sometimes):

class Language(models.Model):
    '''
    List of languages by iso code (2 letter only because country code
    is not needed.
    This should be popluated by getting data from django.conf.locale.LANG_INFO
    '''
    name = models.CharField(max_length=256,
                            null=False,
                            blank=False,
                            verbose_name=_('Language name')
                            )
    name_local = models.CharField(max_length=256,
                                  null=False,
                                  blank=True,
                                  default='',
                                  verbose_name=_('Language name (in that language)'))
    isocode = models.CharField(max_length=2,
                               null=False,
                               blank=False,
                               unique=True,
                               verbose_name=_('ISO 639-1 Language code'),
                               help_text=_('2 character language code without country')
                               )
    sorting = models.PositiveIntegerField(blank=False,
                                          null=False,
                                          default=0,
                                          verbose_name=_('sorting order'),
                                          help_text=_('increase to show at top of the list')
                                          )

    def __str__(self):
        return '%s (%s)' % (self.name, self.name_local)

    class Meta:
        verbose_name = _('language')
        verbose_name_plural = _('languages')
        ordering = ('-sorting', 'name', 'isocode', )

(of course with gettext support, but if you don't need that just remove the _(...) ;)

Edit 2018-07-15: for usabilty reasons I added a sorting field so that commonly used langauges can be shown at the top of the list.

then create the folder <project>/management/commands directory and in that a file importlanguages.py

from django.core.management.base import BaseCommand, CommandError
from dcollect.models import Language
from django.conf.locale import LANG_INFO


class Command(BaseCommand):
    help = 'Imports language codes and names from django.conf.locale.LANG_INFO'

    def add_arguments(self, parser):
        pass

    def handle(self, *args, **options):
        cnt = 0
        for lang in LANG_INFO:
            if len(lang) == 2:
                #we only care about the 2 letter iso codes
                #self.stdout.write(lang + ' ' + LANG_INFO[lang]['name'] + ' ' + LANG_INFO[lang]['name_local'])
                try:
                    l = Language(isocode=lang,
                                 name=LANG_INFO[lang]['name'],
                                 name_local=LANG_INFO[lang]['name_local'])
                    l.save()
                    cnt += 1
                except Exception as e:
                    self.stdout.write('Error adding language %s' % lang)
        self.stdout.write('Added %d languages to dcollect' % cnt)

That was way easier than expected .. I initially was going to just populate 2 or 3 languages manually and leave the rest for later but that was so simple, that I just got it out of the way.

All that needs to be done now to import languages is python manage.py importlanguages - and the real nice part: no new dependencies added ;)

A little follow-up to my post about setting up tryton on Gentoo:

If you run postgresql on a different server you need to deal with setting up the permissions on the postgresql side.

What I was not aware of at that time is that trytond (not trytond-admin it seems) requires access to the template1 database too.

For some reason Trytond did silently fail to start. The only log messages I did see were of level INFO about connecting to template1:

Sat Jul 14 13:17:44 2018] INFO:trytond.backend.postgresql.database:connect to "template1"
Sat Jul 14 13:17:44 2018] INFO:werkzeug:192.168.0.151 - - [14/Jul/2018 13:17:44] "POST / HTTP/1.1" 200 -
Sat Jul 14 13:17:44 2018] INFO:werkzeug:192.168.0.151 - - [14/Jul/2018 13:17:44] "POST / HTTP/1.1" 200 -

so you need to set that up in your pg_hba.conf too. To give an example:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    trytond         trytond         192.168.0.X/0          scram-sha-256
host    template1       trytond         192.168.0.X/0          scram-sha-256

As you can see I give the trytond user access to the trytond database (and in the next line to template1 too). For some reason trytond-admin does not require this, but it would be nice if trytond did log about not getting access to "template".

Of course it needs to be set up to accept connections on the tcp/ip port you set up in postgresql.conf.

Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
tracking down mysterious memory corruption (July 14, 2018, 00:00 UTC)

trofi's blog: tracking down mysterious memory corruption

tracking down mysterious memory corruption

I’ve bought my current desktop machine around 2011 (7 years ago) and mostly had no problems with it save one exception: occasionally (once 2-3 months) firefox, liferea or gcc would mysteriously crash.

Bad PTE

dmesg reports would claim that page table entries refer to already freed physical memory:

Apr 24 03:59:17 sf kernel: BUG: Bad page map in process cc1  pte:200000000 pmd:2f9d0d067
Apr 24 03:59:17 sf kernel: addr:00000000711a7136 vm_flags:00000875 anon_vma:          (null) mapping:000000003882992c index:101a
Apr 24 03:59:17 sf kernel: file:cc1 fault:filemap_fault mmap:btrfs_file_mmap readpage:btrfs_readpage
Apr 24 03:59:18 sf kernel: CPU: 1 PID: 14834 Comm: cc1 Tainted: G         C        4.17.0-rc1-00215-g5e7c7806111a #65
Apr 24 03:59:18 sf kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F4 02/16/2012
Apr 24 03:59:18 sf kernel: Call Trace:
Apr 24 03:59:18 sf kernel:  dump_stack+0x46/0x5b
Apr 24 03:59:18 sf kernel:  print_bad_pte+0x193/0x230
Apr 24 03:59:18 sf kernel:  ? page_remove_rmap+0x216/0x330
Apr 24 03:59:18 sf kernel:  unmap_page_range+0x3f7/0x920
Apr 24 03:59:18 sf kernel:  unmap_vmas+0x47/0xa0
Apr 24 03:59:18 sf kernel:  exit_mmap+0x86/0x170
Apr 24 03:59:18 sf kernel:  mmput+0x64/0x120
Apr 24 03:59:18 sf kernel:  do_exit+0x2a9/0xb90
Apr 24 03:59:18 sf kernel:  ? syscall_trace_enter+0x16d/0x2c0
Apr 24 03:59:18 sf kernel:  do_group_exit+0x2e/0xa0
Apr 24 03:59:18 sf kernel:  __x64_sys_exit_group+0xf/0x10
Apr 24 03:59:18 sf kernel:  do_syscall_64+0x4a/0xe0
Apr 24 03:59:18 sf kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 24 03:59:18 sf kernel: RIP: 0033:0x7f7a039dcb96
Apr 24 03:59:18 sf kernel: RSP: 002b:00007fffdfa09d08 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Apr 24 03:59:18 sf kernel: RAX: ffffffffffffffda RBX: 00007f7a03ccc740 RCX: 00007f7a039dcb96
Apr 24 03:59:18 sf kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
Apr 24 03:59:18 sf kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffe70
Apr 24 03:59:18 sf kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 00007f7a03ccc740
Apr 24 03:59:18 sf kernel: R13: 0000000000000038 R14: 00007f7a03cd5608 R15: 0000000000000000
Apr 24 03:59:18 sf kernel: Disabling lock debugging due to kernel taint
Apr 24 03:59:18 sf kernel: BUG: Bad rss-counter state mm:000000004fac8a77 idx:2 val:-1

It’s not something that is easy to debug or reproduce.

Transparent Hugepages were a new thing at that time and I was using it systemwide via CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y kernel option.

After those crashes I decided to switch it back to CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y only. Crashes became more rare: once in a 5-6 months.

Enabling more debugging facilities in the kernel did not change anything and I moved on.

A few years later I set up nightly builds on this machine to build and test packages in an automatic way. Things were running smoothly except for a few memory-hungry tests that crashed once in a while: firefox, rust and webkit builds every other night hit internal compiler errors in gcc.

Crashes were very hard to isolate or reproduce: every time SIGSEGV happened on a new source file being compiled. I tried to run the same failed gcc command in a loop for hours to try to reproduce the crash but never succeeded. It is usually a strong sign of flaky hardware. At that point I tried memtest86+-5.01 and memtester tools to validate RAM chips. Tools claimed RAM to be fine. My conclusion was that crashes are the result of an obscure software problem causing memory corruption (probably in the kernel). I had no idea how to debug that and kept on using this system. For day-to-day use it was perfectly stable.

A new clue

[years later]

Last year I joined Gentoo’s toolchain@ project and started caring a bit more about glibc and gcc. dilfridge@ did a fantastic job on making glibc testsuite work on amd64 (and also many other things not directly related to this post).

One day I made a major change in how CFLAGS are handled in glibc ebuild and broke a few users with CFLAGS=-mno-sse4.2. That day I ran glibc testsuite to check if I made things worse. There was only one test failing: string/test-memmove.

Of all the obscure things that glibc checks for only one simple memmove() test refused to work!

The failure occured only on 32-bit version of glibc and looked like this:

$ elf/ld.so --inhibit-cache --library-path . string/test-memmove
simple_memmove  __memmove_ssse3_rep     __memmove_ssse3 __memmove_sse2_unaligned        __memmove_ia32
string/test-memmove: Wrong result in function __memmove_sse2_unaligned dst "0x70000084" src "0x70000000" offset "43297733"

This command runs string/test-memmove binary using ./libc.so.6 and elf/ld.so as a loader.

The good thing is that I was somewhat able to reproduce the failure: every few runs the error popped up. Test was not failing deterministically. Every time test failed it was always __memmove_sse2_unaligned but offset was different.

Here is the test source code. The test basically runs memmove() and checks if all memory was moved as expected. Originally test was written to check how memmove() handles memory ranges that span signed/unsigned address boundary around address 0x80000000. Hence the unusual mmap(addr=0x70000000, size=0x20000000) as a way to allocate memory.

Now the fun thing: the error disappeared as soon as I rebooted the machine. And came back one day later (after the usual nightly tests run). To explore the breakage and make a fix I had to find a faster way to reproduce the failure.

At that point the fastest way to make the test fail again was to run firefox build process first. It took “only” 40 minutes to get the machine in a state when I could reproduce the failure.

Once in that state I started shrinking down __memmove_sse2_unaligned implementation to check where exactly data gets transferred incorrectly. 600 lines of straightforward code is not that much.

; check if the copied block is smaller than cache size
167 cmp __x86_shared_cache_size_half, %edi
...
170 jae L(mm_large_page_loop_backward)
...
173 L(mm_main_loop_backward): ; small block, normal instruction
175 prefetcht0 -128(%eax)
...
; load 128 bits from source buffer
177 movdqu -64(%eax), %xmm0
...
; store 128 bits to destination buffer
181 movaps %xmm0, -64(%ecx)
...
244 L(mm_large_page_loop_backward):
...
; load 128 bits from source buffer
245 movdqu -64(%eax), %xmm0
...
; store 128 bits to destination avoiding cache
249 movntdq %xmm0, -64(%ecx)

Note: memcpy()’s behaviour depends on CPU cache size. When the block of copied memory is small (less than CPU cache size, 8MB in my case) memcpy() does not do anything special. Otherwise memcpy() tries to avoid cache pollution and uses non-temporal variant of store instruction: movntdq instead of usual movaps.

While I was poking at this code I found a reliable workaround to make memcpy() never fail on my machine: change movntdq to movdqa:

--- a/sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S
+++ b/sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S
@@ -26,0 +27 @@
+#define movntdq movdqa /* broken CPU? */

I was pondering if I should patch binutils locally to avoid movntdq instruction entirely but eventually discarded it and focused on finding the broken component instead. Who knows what else can be there.

I was so close!

A minimal reproducer

I attempted to craft a testcase that does not depend on glibc’s memcpy() and got this:

#include <emmintrin.h> /* movdqu, sfence, movntdq */
static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items)
{
dest += items - 1;
src += items - 1;
_mm_sfence();
for (; items != 0; items-=1, dest-=1, src-=1)
{
__m128i xmm0 = _mm_loadu_si128(src); // movdqu
if (0)
{
// this would work:
_mm_storeu_si128(dest, xmm0);// movdqu
}
else
{
// this causes single bit memory corruption
_mm_stream_si128(dest, xmm0); // movntdq
}
}
_mm_sfence();
}

This code assumes quite a few things from the caller:

  • dest > src as copying happens right-to-left
  • dest has to be 16-byte aligned
  • block size must be a multiple of 16-bytes.

Here is what C code compiles to with -O2 -m32 -msse2:

(gdb) disassemble memmove_si128u
Dump of assembler code for function memmove_si128u(__m128i_u*, __m128i_u const*, size_t):
0x000008f0 <+0>: push %ebx
0x000008f1 <+1>: lea 0xfffffff(%ecx),%ebx
0x000008f7 <+7>: shl $0x4,%ebx
0x000008fa <+10>: add %ebx,%eax
0x000008fc <+12>: add %ebx,%edx
0x000008fe <+14>: sfence
0x00000901 <+17>: test %ecx,%ecx
0x00000903 <+19>: je 0x923 <memmove_si128u(__m128i_u*, __m128i_u const*, size_t)+51>
0x00000905 <+21>: shl $0x4,%ecx
0x00000908 <+24>: mov %eax,%ebx
0x0000090a <+26>: sub %ecx,%ebx
0x0000090c <+28>: mov %ebx,%ecx
0x0000090e <+30>: xchg %ax,%ax
0x00000910 <+32>: movdqu (%edx),%xmm0
0x00000914 <+36>: sub $0x10,%eax
0x00000917 <+39>: sub $0x10,%edx
0x0000091a <+42>: movntdq %xmm0,0x10(%eax)
0x0000091f <+47>: cmp %eax,%ecx
0x00000921 <+49>: jne 0x910 <memmove_si128u(__m128i_u*, __m128i_u const*, size_t)+32>
0x00000923 <+51>: sfence
0x00000926 <+54>: pop %ebx
0x00000927 <+55>: ret

And with -O2 -m64 -mavx2:

(gdb) disassemble memmove_si128u
Dump of assembler code for function memmove_si128u(__m128i_u*, __m128i_u const*, size_t):
0x0000000000000ae0 <+0>: sfence
0x0000000000000ae3 <+3>: mov %rdx,%rax
0x0000000000000ae6 <+6>: shl $0x4,%rax
0x0000000000000aea <+10>: sub $0x10,%rax
0x0000000000000aee <+14>: add %rax,%rdi
0x0000000000000af1 <+17>: add %rax,%rsi
0x0000000000000af4 <+20>: test %rdx,%rdx
0x0000000000000af7 <+23>: je 0xb1e <memmove_si128u(__m128i_u*, __m128i_u const*, size_t)+62>
0x0000000000000af9 <+25>: shl $0x4,%rdx
0x0000000000000afd <+29>: mov %rdi,%rax
0x0000000000000b00 <+32>: sub %rdx,%rax
0x0000000000000b03 <+35>: nopl 0x0(%rax,%rax,1)
0x0000000000000b08 <+40>: vmovdqu (%rsi),%xmm0
0x0000000000000b0c <+44>: sub $0x10,%rdi
0x0000000000000b10 <+48>: sub $0x10,%rsi
0x0000000000000b14 <+52>: vmovntdq %xmm0,0x10(%rdi)
0x0000000000000b19 <+57>: cmp %rdi,%rax
0x0000000000000b1c <+60>: jne 0xb08 <memmove_si128u(__m128i_u*, __m128i_u const*, size_t)+40>
0x0000000000000b1e <+62>: sfence
0x0000000000000b21 <+65>: retq

Surprisingly (or not so surprisingly) both -m32/-m64 tests started failing on my machine.

It was always second bit of a 128-bit value that was corrupted.

On 128MB blocks this test usually caused one incorrect bit to be copied once in a few runs. I tried to run exactly the same test on other hardware I have access to. None of it failed.

I started to suspect the kernel to corrupt SSE cpu context on context switch. But why only non-temporal instruction is affected? And why only a single bit and not a full 128-bit chunk? Could it be that the kernel forgot to issue mfence on context switch and all in-flight non-temporal instructions stored garbage? That would be a sad race condition. But the single bit flip did not line up with it.

Sounds more like kernel would arbitrarily flip one bit in userspace. But why only when movntdq is involved?

I suspected CPU bug and upgraded CPU firmware, switched machine from BIOS-compatible mode to native UEFI hoping to fix it. Nope. Nothing changed. Same failure persisted: single bit corruption after a heavy load on the machine.

I started thinking on how to speed my test up to avoid firefox compilation as a trigger.

Back to square one

My suspect was bad RAM again. I modified my test all RAM by allocating 128MB chunks at a time and run memmove() on newly allocated RAM to cover all available pages. Test would either find bad memory or OOM-fail.

And bingo! It took only 30 seconds to reproduce the failure. The test usually started reporting the first problem when it got to 17GB of RAM usage.

I have 4x8GB DDR3-DIMMs. I started brute-forcing various configurations of DIMM order on motherboard slots:

A      B      A      B
DIMM-1 -      -      -      : works
DIMM-2 -      -      -      : works
DIMM-3 -      -      -      : works
DIMM-4 -      -      -      : works
DIMM-1 -      DIMM-3 -      : fails (dual channel mode)
DIMM-1 DIMM-3 -      -      : works (single channel mode)
-      DIMM-2 -      DIMM-4 : works (dual channel mode)
DIMM-3 -      DIMM-1 -      : fails (dual channel mode)
-      DIMM-3 -      DIMM-1 : fails (dual channel mode)
-      DIMM-1 -      DIMM-3 : fails (dual channel mode)
-      DIMM-2 -      DIMM-3 : fails (dual channel mode)

And many other combinations of DIMM-3 with others.

It was obvious DIMM-3 did not like team work. I booted from livecd to double-check it’s not my kernel causing all of this. The error was still there.

I bought and plugged in a new pair of RAM modules in place of DIMM-1 and DIMM-3. And had no mysterious failures since!

Time to flip CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y back on :)

Speculations and open questions

It seems that dual-channel mode and cache coherency has something to do with it. A few thoughs:

  1. Single DDR3-DIMM can perform only 64-bit wide loads and stores.
  2. In dual-channel mode two 64-bit wide stores can happen at a time and require presence of two DIMMs.
  3. movntdq stores directly into RAM possibly evicting existing value from cache. That can cause further writeback to RAM to free dirty cache line.
  4. movdqa stores to cache. But eventually cache pressure will also trigger store back to RAM in chunks of cache line size of Last Line Cache (64-bytes=512-bits for me). Why do we not see corruption happening in this case?

It feels like there should be not much difference between non-temporal and normal instructions in terms of size of data being written at a time over memory bus. What likely changes is access sequence of physical addresses under two workloads. But I don’t know how to look into it in detail.

Mystery!

Parting words

  • This crash took me 7 years to figure out :)
  • Fix didn’t require a single line of code :)
  • Bad RAM happens. Even if memtest86+-5.01 disagrees.
  • As I was running memtest86+ in qemu I found a bunch of unrelated bugs in tianocore implementation of UEFI and memtest86+ gentoo ebuild: hybrid ISO is not recognized as an ISO at all. memtest86+ crashes at statrup for yet unknown reason (likely needs to be fixed against newer toolchain).
  • non-temporal instructions are a thing and have their own memory I/O engine.
  • C-level wrappers around SSE and AVX instructions are easy to use!

Have fun!

Posted on July 14, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

July 11, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

Just cuz it took me about 10 minutes to figure outwhy my deployed Django site was just giving me Bad Request (400)

After turning DEBUG on in settings.py I found out the reason: I had forgotten to set the permissions, so that my webserver-user had access to the files needed.

Simple to fix, but can be a pain to find out what happened to a perfectly working site.

July 09, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

Today's news is that we have submitted a manuscript for publication, describing Lab::Measurement and with it our approach towards fast, flexible, and platform-independent measuring with Perl! The manuscript mainly focuses on the new, Moose-based class hierarchy. We have uploaded it to arXiv as well; here is the (for now) full bibliographic information of the preprint:

 "Lab::Measurement - a portable and extensible framework for controlling lab equipment and conducting measurements"
S. Reinhardt, C. Butschkow, S. Geissler, A. Dirnaichner, F. Olbrich, C. Lane, D. Schröer, and A. K. Hüttel
submitted for publication; arXiv:1804.03321 (PDF, BibTeX entry)
If you're using Lab::Measurement in your lab, and this results in some nice publication, then we'd be very grateful for a citation of our work - for now the preprint, and later hopefully the accepted version.

July 07, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
Facebook, desktop apps, and photography (July 07, 2018, 18:04 UTC)

This is an interesting topic, particularly because I had not heard anything about it up to now, despite having many semi-pro and amateur photographer friends (I’m a wannabe). It appears that starting August 1st, Facebook will stop allowing desktop applications to upload photos to albums.

Since I have been uploading all of my Facebook albums through Lightroom, that’s quite a big deal for me. On Jeffrey Friedl’s website, there’s this note:

Warning: this plugin will likely cease to work as of August 1, 2018, because Facebook is revoking photo-upload privileges for all non-browser desktop apps like this.

As of June 2018, Adobe and I are in discussions with Facebook to see whether something might be worked out, but success is uncertain.

This is now less than a month before the deadline, and it appears there’s no update for this. Is it Facebook trying to convince people to just share all their photos as they were shot? Is it Adobe not paying attention trying to get people on their extremely-expensive Adobe CC Cloud products? (I have over 1TB of pictures shot, I can’t use their online service, it would cost me so much more in storage!) I don’t really know, but it clearly seems to be the case that my workflow is being deprecated.

Leaving aside the consideration of the impact of this on me alone, I would expect that most of the pro- and semi-pro-photographers would want to be able to upload their pictures without having to manually drag them with Facebook’s flaky interface. And it feels strange that Facebook wants to stop “owning” those photos altogether.

But there’s a bigger impact in my opinion, which should worry privacy-conscious users (as long as they don’t subscribe to the fantasy ideal of people giving up on sharing pictures): this moves erodes the strict access controls from picture publishing that defined social media up to now, for any of the users who have been relying on offline photo editing.

In my case, the vast majority of the pictures I take are actually landscapes, flowers, animals, or in general not private events. There’s the odd conference or con I bring my camera to (or should I say used to bring it to), or a birthday party or other celebration. Right now, I have been uploading all the non-people pictures as public (and copied to Flickr), and everything that involves people as friends-only (and only rarely uploaded to Flickr with “only me” access). Once the changes go into effect, I lose the ability to make simple access control decisions.

Indeed, if I was to upload the content to Flickr and use friends-only limited access, very few people would be able to see any of the pictures: Flickr has lost all of its pretension to be a social media platform once Yahoo stopped being relevant. And I doubt that the acquisition of SmugMug will change that part, as it would be just a matter of duplicating a social graph that Facebook already has. So I’m fairly sure a very common solution to that is going to be to make the photos public, and maybe the account not discoverable. After all who might be mining the Web for unlisted accounts of vulnerable people? (That’s sarcasm if it wasn’t clear.)

In my case it’s just going to be a matter of not bringing my camera to private events anymore. Not the end of the world, since I’m already not particularly good at portrait photography, and not my particular area of interest. But I do think that there’s going to be quite a bit of problems in the future.

And if you think this is not going to be a big deal at all, because most parties have pictures uploaded by people directly on their mobile phones… I disagree. Weddings, christenings, cons, sport matches, all these events usually have their share of professional photographers, and all these events need to have a way to share the output with not only the people who hired them, but also the friends of those, like the invitees at a wedding.

And I expect that for many professionals, it’s going to be a matter of finding a new service to upload the data to. Mark my words, as I expect we’ll find that there will be, in the future, leaks of wedding pictures used to dox notable people. And those will be due to insecure, or badly-secured, photo sharing websites, meant to replace Facebook after this change in terms.

July 06, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
A botspot story (July 06, 2018, 14:50 UTC)

I felt like sharing a recent story that allowed us identify a bot in a haystack thanks to Scylla.

 

The scenario

While working on loading up 2B+ of rows into Scylla from Hive (using Spark), we noticed a strange behaviour in the performances of one of our nodes:

 

So we started wondering why that server in blue was having those peaks of load and was clearly diverging from the two others… As we obviously expect the three nodes to behave the same, there were two options on the table:

  1. hardware problem on the node
  2. bad data distribution (bad schema design? consistent hash problem?)

We shared this with our pals from ScyllaDB and started working on finding out what was going on.

The investigation

Hardware?

Hardware problem was pretty quickly evicted, nothing showed up on the monitoring and on the kernel logs. I/O queues and throughput were good:

Data distribution?

Avi Kivity (ScyllaDB’s CTO) quickly got the feeling that something was wrong with the data distribution and that we could be facing a hotspot situation. He quickly nailed it down to shard 44 thanks to the scylla-grafana-monitoring platform.

Data is distributed between shards that are stored on nodes (consistent hash ring). This distribution is done by hashing the primary key of your data which dictates the shard it belongs to (and thus the node(s) where the shard is stored).

If one of your keys is over represented in your original data set, then the shard it belongs to can be overly populated and the related node overloaded. This is called a hotspot situation.

tracing queries

The first step was to trace queries in Scylla to try to get deeper into the hotspot analysis. So we enabled tracing using the following formula to get about 1 trace per second in the system_traces namespace.

tracing probability = 1 / expected requests per second throughput

In our case, we were doing between 90K req/s and 150K req/s so we settled for 100K req/s to be safe and enabled tracing on our nodes like this:

# nodetool settraceprobability 0.00001

Turns out tracing didn’t help very much in our case because the traces do not include the query parameters in Scylla 2.1, it is becoming available in the soon to be released 2.2 version.

NOTE: traces expire on the tables, make sure your TRUNCATE the events and sessions tables while iterating. Else you will have to wait for the next gc_grace_period (10 days by default) before they are actually removed. If you do not do that and generate millions of traces like we did, querying the mentioned tables will likely time out because of the “tombstoned” rows even if there is no trace inside any more.

looking at cfhistograms

Glauber Costa was also helping on the case and got us looking at the cfhistograms of the tables we were pushing data to. That proved to be clearly highlighting a hotspot problem:

histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                             (micros)          (micros)           (bytes)                  
50%             0,00              6,00              0,00               258                 2
75%             0,00              6,00              0,00               535                 5
95%             0,00              8,00              0,00              1916                24
98%             0,00             11,72              0,00              3311                50
99%             0,00             28,46              0,00              5722                72
Min             0,00              2,00              0,00               104                 0
Max             0,00          45359,00              0,00          14530764            182785

What this basically means is that 99% percentile of our partitions are small (5KB) while the biggest is 14MB! That’s a huge difference and clearly shows that we have a hotspot on a partition somewhere.

So now we know for sure that we have an over represented key in our data set, but what key is it and why?

The culprit

So we looked at the cardinality of our data set keys which are SHA256 hashes and found out that indeed we had one with more than 1M occurrences while the second highest one was around 100K!…

Now that we had the main culprit hash, we turned to our data streaming pipeline to figure out what kind of event was generating the data associated to the given SHA256 hash… and surprise! It was a client’s quality assurance bot that was constantly browsing their own website with legitimate behaviour and identity credentials associated to it.

So we modified our pipeline to detect this bot and discard its events so that it stops polluting our databases with fake data. Then we cleaned up the million of events worth of mess and traces we stored about the bot.

The aftermath

Finally, we cleared out the data in Scylla and tried again from scratch. Needless to say that the curves got way better and are exactly what we should expect from a well balanced cluster:

Thanks a lot to the ScyllaDB team for their thorough help and high spirited support!

I’ll quote them conclude this quick blog post:

Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
Django (2.0) ForeignKey -> ManyToManyField (July 06, 2018, 12:07 UTC)

So I just thought my small URL Todo django app (which I will poste the source of online soon - when I have some time to write up some README,.. too) could benefit from each urlTodo being able to be part of more than one category.

Now I already had some data in it so I was unsure. On IRC I was advised it might or might not work with automatic migrations. So I tested it.

Turns out it works ok, but does loose the assignement of the category (which was not an issue since  I did not have that much data in it yet).

so all I had to do was change

category = models.ForeignKey(Category,
                             on_delete=models.PROTECT,
                             related_name='urltodos',
                             null=False,
                             verbose_name=_('category'))

to

category = models.ManyToManyField(Category,
                                  related_name='urltodos',
                                  verbose_name=_('category'))

and run ./manage.py makemigrations and ./manage.py migrate and then assign the categories that got lost agian.

Again I am impressed how easily this could be done with django!

For non-german speakers: This article is about how to use variables and formulas to suppress features + related ones - if someone is really interested i can translate it to english (but i'd need new screenshots i guess too so ..)

Vorab: mir ist das Konzept von Teilefamilien bekannt, aber bei Blechteilen ist das leider noch immer etwas suboptimal, da Teilkopien erstellt werden statt Blechteilen, was bei Teilen mit Eckbehandlungen schon zu Problemen führen kann).

Da ich öfters (sehr) ähnliche Blechteile brauche (Zuschnitte, Winkel, U-Profile, Schachteln) habe ich mir etwas überlegt um das ganze zu vereinfachen:

Der Ausgangspunkt ist eine rechteckige Lasche mit jeweils einem Lappen und 4 x Ecke schliessen:

Schachtel

Variable für alle Maße erstellen:

Diese Variablen werden dann den Formelementen zugewiesen (V420 und V422 sind wie man sieht Länge und Breite der Lasche):

Danach wird für alle 4 Laschen eine Unterdrückungsvariable hinzugefügt:

Danach einfach die Unterdrückungsvariablen dementsprechend Formeln zuweisen:

und zu guter letzt noch die Eckbehandlungen unterdrücken wo nötig:

Jetzt kann man einfach durch ändern von L,B, S1-S4 von einem Zuschnitt, über Winkel und U-Profil bis zur kompletten Schachtel alles schnell erstellen, ohne lange zu überlegen und/oder fehlgeschlagene Formelemente zu haben, da die Eckbehandlungen so auch automatisch mit unterdrückt werden, sobald eine der 2 angrenzenden Laschen Länge 0 hat und deshalb unterdrückt wird.

July 05, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
Nexctloud & sharing options .. (July 05, 2018, 09:48 UTC)

So I set the sharing options on my Nextcloud to "Restrict users to only share with users in their groups" and disabled user autocompletion since I have a lot of users (for a private install anyway) by now and don't want everyone to be able to see the others easily.

So now why could I not share with my new group now ...

Turns out that sharing options for "Restrict users to only share with users in their groups" also counts for the admin user itself (I found that out because I forgot to add myself to a new group ;))

--------------

Aus diversen Gründen habe ich bei meiner Nextcloud Installation die (Teilen) Einstellung "Benutzer auf das Teilen innerhalb ihrer Gruppen beschränken" aktiviert (ich habe - für eine private Installation - schon relativ viele Benutzer, und will nicht, daß die Leute so leicht die anderen sehen/finden können).

Das Problem: Ich konnte aufeinmal nicht mehr mit der Gruppe Teilen ...

Nach etwas probieren ist mir aufgefallen, daß ich mich selber nicht zu der neuen Gruppe hinzugefügt hatte.. => Feststellung: die Einstellung "Benutzer auf das Teilen innerhalb ihrer Gruppen beschränken" gilt auch für Admin Benutzer ;)

July 03, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

Turns out I should read things like the gentoo wiki upgrade guide *first* to avoid issues ..

After installing php 7.1 to replace the (quite old) php 5.6 I did check php.ini ,.. but forgot to check for compiled modules and setting PHP_TARGETS .. then wondered why I just got Internal Server Error messages..

Thanks to the php team for writing this nice guide to remind people like me of what to do:

https://wiki.gentoo.org/wiki/PHP/Upgrading_to_PHP_7.1

As always the Gentoo Wiki is a great source of information, and I like to use it as a reminder of the things needed when installing/upgrading,.. ;)

July 02, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)
Altivec and VSX in Rust (part 1) (July 02, 2018, 11:05 UTC)

I’m involved in implementing the Altivec and VSX support on rust stdsimd.

Supporting all the instructions in this language is a HUGE endeavor since for each instruction at least 2 tests have to be written and making functions type-generic gets you to the point of having few pages of implementation (that luckily desugars to the single right instruction and nothing else).

Since I’m doing this mainly for my multimedia needs I have a short list of instructions I find handy to get some code written immediately and today I’ll talk a bit about some of them.

This post is inspired by what Luc did for neon, but I’m using rust instead.

If other people find it useful, I’ll try to write down the remaining istructions.

Permutations

Most if not all the SIMD ISAs have at least one or multiple instructions to shuffle vector elements within a vector or among two.

It is quite common to use those instructions to implement matrix transposes, but it isn’t its only use.

In my toolbox I put vec_perm and vec_xxpermdi since even if the portable stdsimd provides some shuffle support it is quite unwieldy compared to the Altivec native offering.

vec_perm: Vector Permute

Since it first iteration Altivec had a quite amazing instruction called vec_perm or vperm:

    fn vec_perm(a: i8x16, b: i8x16, c: i8x16) -> i8x16 {
        let mut d;
        for i in 0..16 {
            let idx = c[i] & 0xf;
            d[i] = if (c[i] & 0x10) == 0 {
                a[idx]
            } else {
                b[idx]
            };
        }
        d
    }

It is important to notice that the displacement map c is a vector and not a constant. That gives you quite a bit of flexibility in a number of situations.

This instruction is the building block you can use to implement a large deal of common patterns, including some that are also covered by stand-alone instructions e.g.:
– packing/unpacking across lanes as long you do not have to saturate: vec_pack, vec_unpackh/vec_unpackl
– interleave/merge two vectors: vec_mergel, vec_mergeh
– shift N bytes in a vector from another: vec_sld

It can be important to recall this since you could always take two permutations and make one, vec_perm itself is pretty fast and replacing two or more instructions with a single permute can get you a pretty neat speed boost.

vec_xxpermdi Vector Permute Doubleword Immediate

Among a good deal of improvements VSX introduced a number of instructions that work on 64bit-elements vectors, among those we have a permute instruction and I found myself using it a lot.

    #[rustc_args_required_const(2)]
    fn vec_xxpermdi(a: i64x2, b: i64x2, c: u8) -> i64x2 {
        match c & 0b11 {
            0b00 => i64x2::new(a[0], b[0]);
            0b01 => i64x2::new(a[1], b[0]);
            0b10 => i64x2::new(a[0], b[1]);
            0b11 => i64x2::new(a[1], b[1]);
        }
    }

This instruction is surely less flexible than the previous permute but it does not require an additional load.

When working on video codecs is quite common to deal with blocks of pixels that go from 4×4 up to 64×64, before vec_xxpermdi the common pattern was:

    #[inline(always)]
    fn store8(dst: &mut [u8x16], v: &[u8x16]) {
        let data = dst[i];
        dst[i] = vec_perm(v, data, TAKE_THE_FIRST_8);
    }

That implies to load the mask as often as needed as long as the destination.

Using vec_xxpermdi avoids the mask load and that usually leads to a quite significative speedup when the actual function is tiny.

Mixed Arithmetics

With mixed arithmetics I consider all the instructions that do in a single step multiple vector arithmetics.

The original altivec has the following operations available for the integer types:
vec_madds
vec_mladd
vec_mradds
vec_msum
vec_msums
vec_sum2s
vec_sum4s
vec_sums

And the following two for the float type:
vec_madd
vec_nmsub

All of them are quite useful and they will all find their way in stdsimd pretty soon.

I’m describing today vec_sums, vec_msums and vec_madds.

They are quite representative and the other instructions are similar in spirit:
vec_madds, vec_mladd and vec_mradds all compute a lane-wise product, take either the high-order or the low-order part of it
and add a third vector returning a vector of the same element size.
vec_sums, vec_sum2s and vec_sum4s all combine an in-vector sum operation with a sum with another vector.
vec_msum and vec_msums both compute a sum of products, the intermediates are added together and then added to a wider-element
vector.

If there is enough interest and time I can extend this post to cover all of them, for today we’ll go with this approximation.

vec_sums: Vector Sum Saturated

Usually SIMD instruction work with two (or 3) vectors and execute the same operation for each vector element.
Sometimes you want to just do operations within the single vector and vec_sums is one of the few instructions that let you do that:

    fn vec_sums(a: i32x4, b: i32x4) -> i32x4 {
        let d = i32x4::new(0, 0, 0, 0);

        d[3] = b[3].saturating_add(a[0]).saturating_add(a[1]).saturating_add(a[2]).saturating_add(a[3]);

        d
    }

It returns in the last element of the vector the sum of the vector elements of a and the last element of b.
It is pretty handy when you need to compute an average or similar operations.

It works only with 32bit signed element vectors.

vec_msums: Vector Multiply Sum Saturated

This instruction sums the 32bit element of the third vector with the two products of the respective 16bit
elements of the first two vectors overlapping the element.

It does quite a bit:

    fn vmsumshs(a: i16x8, b: i16x8, c: i32x4) -> i32x4 {
        let d;
        for i in 0..4 {
            let idx = 2 * i;
            let m0 = a[idx] as i32 * b[idx] as i32;
            let m1 = a[idx + 1] as i32 * b[idx + 1] as i32;
            d[i] = c[i].saturating_add(m0).saturating_add(m1);
        }
        d
    }

    fn vmsumuhs(a: u16x8, b: u16x8, c: u32x4) -> u32x4 {
        let d;
        for i in 0..4 {
            let idx = 2 * i;
            let m0 = a[idx] as u32 * b[idx] as u32;
            let m1 = a[idx + 1] as u32 * b[idx + 1] as u32;
            d[i] = c[i].saturating_add(m0).saturating_add(m1);
        }
        d
    }

    ...

    fn vec_msums<T, U>(a: T, b: T, c: U) -> U
    where T: sealed::VectorMultiplySumSaturate<U> {
        a.msums(b, c)
    }

It works only with 16bit elements, signed or unsigned. In order to support that on rust we have to use some creative trait.
It is quite neat if you have to implement some filters.

vec_madds: Vector Multiply Add Saturated

    fn vec_madds(a: i16x8, b: i16x8, c: i16x8) -> i16x8 {
        let d;
        for i in 0..8 {
            let v = (a[i] as i32 * b[i] as i32) >> 15;
            d[i] = (v as i16).saturating_add(c[i]);
        }
        d
    }

Takes the high-order 17bit of the lane-wise product of the first two vectors and adds it to a third one.

Coming next

Raptor Enginering kindly gave me access to a Power 9 through their Integricloud hosting.

We could run some extensive benchmarks and we found some peculiar behaviour with the C compilers available on the machine and that got me, Luc and Alexandra a little puzzled.

Next time I’ll try to collect in a little more organic way what I randomly put on my twitter as I noticed it.

June 29, 2018
My comments on the Gentoo Github hack (June 29, 2018, 16:00 UTC)

Several news outlets are reporting on the takeover of the Gentoo GitHub organization that was announced recently. Today 28 June at approximately 20:20 UTC unknown individuals have gained control of the Github Gentoo organization, and modified the content of repositories as well as pages there. We are still working to determine the exact extent and … Continue reading "My comments on the Gentoo Github hack"

June 28, 2018

2018-07-04 14:00 UTC

We believe this incident is now resolved. Please see the incident report for details about the incident, its impact, and resolution.

2018-06-29 15:15 UTC

The community raised questions about the provenance of Gentoo packages. Gentoo development is performed on hardware run by the Gentoo Infrastructure team (not github). The Gentoo hardware was unaffected by this incident. Users using the default Gentoo mirroring infrastructure should not be affected.

If you are still concerned about provenance or are unsure what solution you are using, please consult https://wiki.gentoo.org/wiki/Project:Portage/Repository_Verification. This will instruct you on how to verify your repository.

2018-06-29 06:45 UTC

The gentoo GitHub organization remains temporarily locked down by GitHub support, pending fixes to pull-request content.

For ongoing status, please see the Gentoo infra-status incident page.

For later followup, please see the Gentoo Wiki page for GitHub 2018-06-28. An incident post-mortem will follow on the wiki.

June 23, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

This post starts in the strangest of places. The other night, my mother was complaining how few free Italian books are available on the Kindle Store.

Turns out, a friend of the family, who also has a Kindle, has been enjoying reading free English older books on hers. As my mother does not speak or read English, she’d been complaining that the same is not possible in Italian.

The books she’s referring to are older books, the copyright of which expired, and that are available on Project Gutenberg. Indeed, the selection of Italian books on that site is fairly limited, and it is something that I have indeed been sadden about before.

What has Project Gutenberg to do with Kindle? Well, Amazon appears to collect books from Project Gutenberg, convert them to Kindle’s native format, and “sell” them on the Kindle Store. I say “sell” because for the most part, these are available at $0.00, and are thus available for free.

While there is no reference to Project Gutenberg on their store pages, there’s usually a note on the book:

This book was converted from its physical edition to the digital format by a community of volunteers. You may find it for free on the web. Purchase of the Kindle edition includes wireless delivery.

Another important point is that (again, for the most part), the original language editions are also available! This is how I started reading Jules Verne’s Le Tour du monde en quatre-vingts jours while trying to brush up my French to workable levels.

Having these works available on the Kindle Store, free of both direct cost and delivery charge, is in my opinion a great step to distribute knowledge and culture. As my nephews (blood-related and otherwise) start reaching reading age, I’m sure that what I will give them as presents is going to be Kindle readers, because between having access to this wide range of free books, and the embedded touch-on dictionary, they feel like something I’d have thoroughly enjoyed using when I was a kid myself.

Unfortunately, this is not all roses. the Kindle Store still georestrict some books, so from my Kindle Store (which is set in the US), I cannot download Ludovico Ariosto’s Orlando Furioso in Italian (though I can download the translation for free, or buy for $0.99 a non-Project Gutenberg version of the original Italian text). And of course there is the problem of coverage for the various languages.

Italian, as I said, appears to be a pretty bad one when it comes to coverage. If I look at Luigi Pirandello’s books there are only seven entries, one of which is in English, and another one being a duplicate. Compare this with the actual list of his works and you can see that it’s very lacking. And since Pirandello died in 1936, his works are already in the public domain.

Since I have not actually being active with Project Gutenberg, I only have second hand knowledge of why this type of problem happens. One of the thing I remember having been told about this, is that most of the books you buy in Italian stores are either annotated editions, or updated for modern Italian, which causes their copyright to be extended do the death of the editor, annotator or translator.

This lack of access to Italian literature is a big bother, and quite a bit of a showstopper to giving a Kindle to my Italian “nephews”. I really wish I could find a way to fix the problem, whether it is by technical or political means.

On the political side, one could expect that, with the focus on culture of the previous Italian government, and the focus of the current government on the free-as-in-beer options, it would be easy to convince them to release all of the Italian literature that is in the public domain for free. Unfortunately, I wouldn’t even know where to start to ask them to do that.

On the technical side, maybe it is well due time that I spend a significant amount of time on my now seven years old project of extracting a copy of the data from the data files of Zanichelli’s Italian literature software (likely developed at least in part with public funds).

The software was developed for Windows 3.1 and can’t be run on any modern computer. I should probably send the ISOs of it to the Internet Archive, and they may be able to keep it running there on DosBox with a real copy of Windows 3.1, since Wine appears to not support the 16-bit OLE interfaces that the software depends on.

If you wonder what would be a neat thing for Microsoft to release as open-source, I would probably suggest the whole Windows 3.1 source code would be a starting point. If nothing else, with the right license it would be possible to replace the half-complete 16-bit DLLs of Wine with official, or nearly-official copies.

I guess it’s time to learn more about Windows 3.1 in my “copious spare time” (h/t Charles Stross), and start digging into this. Maybe Ryan’s 2ine might help, as OS/2 and Windows 3.1 are closer than the latter is to modern Windows.

June 16, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

The recent GitHub craze that got a number of Free Software fundamentalists to hurry away from GitHub towards other hosting solutions.

Whether it was GitLab (a fairly natural choice given the nature of the two services), BitBucket, or SourceForge (which is trying to rebuild a reputation as a Free Software friendly hosting company), there are a number of options of new SaaS providers.

At the same time, a number of projects have been boasting (and maybe a bit too smugly, in my opinion) that they self-host their own GitLab or similar software, and suggested other projects to do the same to be “really free”.

A lot of the discourse appears to be missing nuance on the compromises that using SaaS hosting providers, self-hosting for communities and self-hosting for single projects, and so I thought I would gather my thoughts around this in one single post.

First of all, you probably remember my thoughts on self-hosting in general. Any solution that involves self-hosting will require a significant amount of ongoing work. You need to make sure your services keep working, and keep safe and secure. Particularly for FLOSS source code hosting, it’s of primary importance that the integrity and safety of the source code is maintained.

As I already said in the previous post, this style of hosting works well for projects that have a community, in which one or more dedicated people can look after the services. And in particular for bigger communities, such as KDE, GNOME, FreeDesktop, and so on, this is a very effective way to keep stewardship of code and community.

But for one-person projects, such as unpaper or glucometerutils, self-hosting would be quite bad. Even for xine with a single person maintaining just site+bugzilla it got fairly bad. I’m trying to convince the remaining active maintainers to migrate this to VideoLAN, which is now probably the biggest Free Software multimedia project and community.

This is not a new problem. Indeed, before people rushed in to GitHub (or Gitorious), they rushed in to other services that provided similar integrated environments. When I became a FLOSS developer, the biggest of them was SourceForge — which, as I noted earlier, was recently bought by a company trying to rebuild its reputation after a significant loss of trust. These environments don’t only include SCM services, but also issue (bug) trackers, contact email and so on so forth.

Using one of these services is always a compromise: not only they require an account on each service to be able to interact with them, but they also have a level of lock-in, simply because of the nature of URLs. Indeed, as I wrote last year, just going through my old blog posts to identify those referencing dead links had reminded me of just how many project hosting services shut down, sometimes dragging along (Berlios) and sometimes abruptly (RubyForge).

This is a problem that does not only involve services provided by for-profit companies. Sunsite, RubyForge and Berlios didn’t really have companies behind, and that last one is probably one of the closest things to a Free Software co-operative that I’ve seen outside of FSF and friends.

There is of course Savannah, FSF’s own Forge-lookalike system. Unfortunately for one reason or another it has always lagged behind the featureset (particularly around security) of other project management SaaS. My personal guess is that it is due to the political nature of hosting any project over on FSF’s infrastructure, even outside of the GNU project.

So what we need would be a politically-neutral, project-agnostic hosting platform that is a co-operative effort. Unfortunately, I don’t see that happening any time soon. The main problem is that project hosting is expensive, whether you use dedicated servers or cloud providers. And it takes full time people to work as system administrators to keep it running smoothly and security. You need professionals, too — or you may end up like lkml.org being down when its one maintainer goes on vacation and something happens.

While there are projects that receive enough donations that they would be able to sustain these costs (see KDE, GNOME, VideoLAN), I’d be skeptical that there would be an unfocused co-operative that would be able to take care of this. Particularly if it does not restrict creation of new projects and repositories, as that requires particular attention to abuse, and to make good guidelines of which content is welcome and which one isn’t.

If you think that that’s an easy task, consider that even SourceForge, with their review process, that used to take a significant amount of time, managed to let joke projects use their service and run on their credentials.

A few years ago, I would have said that SFLC, SFC and SPI would be the right actors to set up something like this. Nowadays? Given their infights I don’t expect them being any useful.

June 14, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

Today's great news is that our manuscript "Nanomechanical characterization of the Kondo charge dynamics in a carbon nanotube" has been accepted for publication by Physical Review Letters.

The Kondo effect is a many-body phenomenon at low temperature that results from a quantum state degeneracy, as, e.g., the one of spin states in absence of a magnetic field. In its simplest case, it makes a quantum dot, in our case a carbon nanotube with some trapped electrons on it, behave very different for an even and an odd number of electrons. At an even number of trapped electrons, no current can flow through the nanotube, since temperature and applied bias voltage are too low to charge it with one more elementary charge; this phenomenon is called Coulomb blockade. Strikingly, at odd electron number, when two degenerate quantum states in the nanotube are available, Coulomb blockade seems not to matter, and a large current can flow. Theory explains this by assuming that a localized electron couples to electrons in the contacts, forming a combined, delocalized singlet quantum state.
What carries the Kondo-enhanced current, and how does the electric charge now accumulate in the carbon nanotube? We use the vibration of the macromolecule to measure this. As also in the case of, e.g., a guitar string, the resonance frequency of a nanotube changes when you pull on it; in the case of the carbon nanotube this is sensitive enough to resolve fractions of the force caused by a single elementary charge. From the vibration frequency, as function of the electrostatic potential, we calculate the average number of electrons on the nanotube, and can then compare the odd and even number cases.
A surprising result of our evaluation is that the charge trapped on the nanotube behaves the same way in the even and odd occupation case, even though the current through it is completely different. Sequential tunneling of electrons can model the charge accumulation, and with it the mechanical behaviour. The large Kondo current is carried by virtual occupation of the nanotube alone, i.e., electrons tunneling on and immediately off again so they do not contribute to the charge on it.

"Nanomechanical Characterization of the Kondo Charge Dynamics in a Carbon Nanotube"
K. J. G. Götz, D. R. Schmid, F. J. Schupp, P. L. Stiller, Ch. Strunk, and A. K. Hüttel
Physical Review Letters 120, 246802 (2018); arXiv:1802.00522 (PDF, HTML, supplementary information)

Luca Barbato a.k.a. lu_zero (homepage, bugs)
Video Compression Bounty Hunters (June 14, 2018, 18:43 UTC)

In this post, we (Luca Barbato and Luc Trudeau) joined forces to talk about the awesome work we’ve been doing on Altivec/VSX optimizations for the libvpx library, you can read it here or on Luc’s medium.

Both of us where in Brussels for FOSDEM 2018, Luca presented his work on rust-av and Luc was there to hack on rav1e – an experimental AV1 video encoder in Rust.

Luca joined the rav1e team and helped give hints about how to effectively leverage rust. Together, we worked on AV1 intra prediction code, among the other things.

Luc Trudeau: I was finishing up my work on Chroma from Luma in AV1, and wanted to stay involved in royalty free open source video codecs. When Luca talked to me about libvpx bounties on Bountysource, I was immediately intrigued.

Luca Barbato: Luc just finished implementing the Neon version of his CfL work and I wondered how that code could work using VSX. I prepared some of the machinery that was missing in libaom and Luc tried his hand on Altivec. We still had some pending libvpx work sponsored by IBM and I asked him if he wanted to join in.

What’s libvpx?

For those less familiar, libvpx is the official Google implementation of the VP9 video format. VP9 is most notably used in YouTube and Netflix. VP9 playback is available on some browsers including Chrome, Edge and Firefox and also on Android devices, covering the 75.31% of the global user base.

Ref: caniuse.com VP9 support in browsers.

Why use VP9, when the de facto video format is H.264/AVC?

Because VP9 is royalty free and the bandwidth savings are substantial when compared to H.264 when playback is available (an estimated 3.3B devices support VP9). In other words, having VP9 as a secondary codec can pay for itself in bandwidth savings by not having to send H.264 to most users.

Ref: Netflix VP9 compression analysis.

Why care about libvpx on Power?

Dynamic adaptive streaming formats like HLS and MPEG DASH have completely changed the game of streaming video over the internet. Streaming hardware and custom multimedia servers are being replaced by web servers.

From the servers’ perspective streaming video is akin to serving small videos files; lots of small video files! To cover all clients and most network conditions a considerable amount of video files must be encoded, stored and distributed.

Things are changing fast and while the total cost of ownership of video content for previous generation video formats, like H.264, was mostly made up of bandwidth and hosting, encoding costs are growing with more complex video formats like HEVC and VP9.

This complexity is reported to have grown exponentially with the upcoming AV1 video format. A video format, built on the libVPX code base, by the Alliance for Open Media, of which IBM is a founding member.

Ref: Facebook’s AV1 complexity analysis

At the same time, IBM and its partners in the OpenPower Foundation are releasing some very impressive hardware with the new Power9 processor line up. Big Iron Power9 systems, like the Talos II from Raptor Computing Systems and the collaboration between Google and Rackspace on Zaius/Barreleye servers, are ideal solutions to the tackle the growing complexity of video format encoding.

However, these awesome machines are currently at a disadvantage when encoding video. Without the platform specific optimizations, that their competitors enjoy, the Power9 architecture can’t be fully utilized. This is clearly illustrated in the x264 benchmark released in a recent Phoronix article.

Ref: Phoronix x264 server benchmark.

Thanks to the optimization bounties sponsored by IBM, we are hard at work bridging the gap in libvpx.

Optimization bounties?

Just like bug bounty programs, optimization make for great bounties. Companies that see benefit in platform specific optimizations for video codecs can sponsor our bounties on the Bountysource platform.

Multiple companies can sponsor the same bounty, thus sharing cost of more important bounties. Furthermore, bounties are a minimal risk investment for sponsors, as they are only paid out when the work is completed (and peer reviewed by libvpx maintainers)

Not only is the Bountysource platform a win for companies that directly benefit from the bounties they are sponsoring, it’s also a win for developers (like us) who can get paid to work on free and open source projects that we are passionate about. Optimization bounties are a source of sustainability in the free and open source software ecosystem.

How do you choose bounties?

Since we’re a small team of bounty hunters (Luca Barbato, Alexandra Hájková, Rafael de Lucena Valle and Luc Trudeau), we need to play it smart and maximize the impact of our work. We’ve identified two common use cases related to streaming on the Power architecture: YouTube-like encodes and real time (a.k.a. low latency) encodes.

By profiling libvpx under these conditions, we can determine the key functions to optimize. The following charts show the percentage of time spent the in top 20 functions of the libvpx encoder (without Altivec/VSX optimisations) on a Power8 system, for both YouTube-like and real time settings.

It’s interesting to see that the top 20 functions make up about 80% of the encoding time. That’s similar in spirit to the Pareto principle, in that we don’t have to optimize the whole encoder to make the Power architecture competitive for video encoding.

We see a similar distribution between YouTube-like encoding settings and real time video encoding. In other words, optimization bounties for libvpx benefit both Video on Demand (VOD) and live broadcast services.

We add bounties on the Bountysource platform around common themed functions like: convolution, sum of absolute differences (SAD), variance, etc. Companies interested in libvpx optimization can go and fund these bounties.

What’s the impact of this project so far?

So far, we delivered multiple libvpx bounties including:

  • Convolution
  • Sum of absolute differences (SAD)
  • Quantization
  • Inverse transforms
  • Intra prediction
  • etc.

To see the benefit of our work, we compiled the latest version of libVPX with and without VSX optimizations and ran it on a Power8 machine. Note that the C compiled versions can produce Altivec/VSX code via auto vectorization. The results, in frames per minutes, are shown below for both YouTube-like encoding and Real time encoding.

Our current VSX optimizations give approximately a 40% and 30% boost in encoding speed for YouTube-like and real time encoding respectively. Encoding speed increases in the range of 10 to 14 frames per minute can considerably reduce cloud encoding costs for Power architecture users.

In the context of real time encoding, the time saved by the platform optimization can be put to good use to improve compression efficiency. Concretely, a real time encoder will encode in real time speed, but speeding up the encoders allows for operators to increase the number of coding tools, resulting in better quality for the viewers and bandwidth savings for operators.

What’s next?

We’re energized by the impact that our small team of bounty hunters is having on libvpx performance for the Power architecture and we wanted to share it in this blog post. We look forward to getting even more performance from libvpx on the Power architecture. Expect considerable performance improvement for the Power architecture in the next libvpx release (1.8).

As IBM targets its Power9 line of systems at heavy cloud computations, it seems natural to also aim all that power at tackling the growing costs of AV1 encodes. This won’t happen without platform specific optimizations and the time to start is now; as the AV1 format is being finalized, everyone is still in the early phases of optimization. We are currently working with our sponsors to set up AV1 bounties, so stay tuned for an upcoming post.

June 09, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I have written before about the CRM I wrote for a pizzeria and I am happy to see that even FSFE started looking into Free Software for SME. I also noted the needs for teams to develop healthy projects. Today I want to give an example of why I think these things are not as easy as most people expect them to be, and how many different moving parts exist that are required to align to make Free Software for SME.

As I’m no longer self-employed, and I have no intention of going back to be a MSP in my lifetime, what I’m writing here is more of a set of “homework pointers” if a community of SME-targeted Free Software projects would be formed.

I decided to focus in my thoughts on the need of a brink and mortar store (or high street store if you prefer), mostly because it has a subset of the requirements that I could think of, compared to a restaurant like the pizza place I actually worked with.

These notes are also probably a lot more scattered and incomplete than I would like, because I have only worked retail for a short while, between high school and the two miserable week of university, nearly fifteen years ago, in a bookstore to be precise.

For most of the people who have not worked retail, it might seem like the most important piece of software/hardware for a store is the till, because that is what they interact with most of the time. While the till systems (also called POS) are fairly important, as those are in direct contact with the customer, they are only the tip of the iceberg.

But let’s start with the POS: whether you plan on integrating them directly with a credit card terminal or not, right now there are a number of integrated hardware/software solution for these, that include a touchscreen to input the receipt components and a (usually thermal) printer for the receipts to be printed on, while sometimes allowing the client to be emailed the receipt instead. As far as I know, there’s no Free Software system for this. I do see an increasing number of Clover tills in Europe, and Square in the United States (but these are not the only ones).

The till software is more complicated than one would think, because in addition to the effects that the customers can see (select line items, print receipt, eventually take payment), it has to be able to keep track of the cash flow, whether it is in form of actual cash, or in the form of card payments. Knowing the cash flow is a requisite for any business, as without that information you cannot plan your budgets.

In bigger operations, this would feed into a dedicated ERP system, which would often include an inventory management software — because you need to know how much stock you have and how fast it is moving, to know when to order new stock.

There is also the need to handle invoices, which usually don’t get printed by the till (you don’t want an invoice printed on thermal paper, particularly in countries like Italy, where you’re meant to keep the original of an invoice for over ten years).

And then there is the filing of payable invoices and, well, their payment. This is part of the accounting procedures, and I know of very few systems that allow integration with a bank to the point of automating this part. PSD2 is meant to require financial institutions to provide APIs to make this possible, at least in Europe, but that has been barely received yet, and we’ll have to see what the solution will be.

Different industries have different expected standards, too. When I worked in the bookstore, there was a standard piece of software that was used to consult the online stock of books from various depots, which was required to handle orders of books for people looking for something that was not in the store. While Amazon and other online services have for the most part removed the need for many to custom order books in a store, I know still a few people who do so, simply to make sure the bookstore stays up. And I assume that very similar, yet different, software and systems exist for most other fields of endeavour, such as computer components, watches, and shoes.

Depending on the size of the store, and the amount of employees, and in general the hours of operation, there may also be need for a roster management software, so that the different workers have fair (and legal) shifts, while still being able to manage days off. I don’t know how well solutions like Workday work for small realities, but in general I feel this is likely going to be one area in which Free Software won’t make an easy dent: following all the possible legal frameworks to actually be compliant with the law is the kind of work that requires a full-time staff of people, and unless something changes drastically, I don’t expect any FLOSS project to keep up with that.

You can say that this post is not giving any answer and is just adding more questions. And that’s the case, actually. I don’t have the time or energy of working on this myself, and my job does not involve working with retailers, or even developing user-focused software. I wanted to write this as a starting point of a project if someone is interested in doing so.

In particular, I think that this would be prime territory for a multi-disciplinary university project, starting from asking questions to store owners of their need, and understanding the whole user journey. Which seems to be something that FSFE is now looking into fostering, which I’m very happy about.

Please, help the answer to the question “Can you run a brink and mortar store on Free Software?” be Yes!

June 04, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I was not planning on posting on the blog until next week, trying to stick on a weekly schedule, but today’s announcement of Microsoft acquiring GitHub is forcing my hand a bit.

So, Microsoft is acquiring GitHub, and a number of Open Source developers are losing their mind, in all possible ways. A significant proportion of comments on this that I have seen on my social media is sounding doomsday, as if this spells the end of GitHub, because Microsoft is going to ruin it all for them.

Myself, I think that if it spells the end of anything, is the end of the one-stop-shop to work on any project out there, not because of anything Microsoft did or is going to do, but because a number of developers are now leaving the platform in protest (protest of what? One company buying another?)

Most likely, it’ll be the fundamentalists that will drop their projects away to GitHub. And depending on what they decide to do with their projects, it might even not show on anybody’s radar. A lot of people are pushing for GitLab, which is both an open-core self-hosted platform, and a PaaS offering.

That is not bad. Self-hosted GitLab instances already exist for VideoLAN and GNOME. Big, strong communities are in my opinion in the perfect position to dedicate people to support core infrastructure to make open source software development easier. In particular because it’s easier for a community of dozens, if not hundreds of people, to find dedicated people to work on it. For one-person projects, that’s overhead, distracting, and destructive as well, as fragmenting into micro-instances will cause pain to fork projects — and at the same time, allowing any user who just registered to fork the code in any instance is prone to abuse and a recipe for disaster…

But this is all going to be a topic for another time. Let me try to go back to my personal opinions on the matter (to be perfectly clear that these are not the opinions of my employer and yadda yadda).

As of today, what we know is that Microsoft acquired GitHub, and they are putting Nat Friedman of Xamarin fame (the company that stood behind the Mono project after Novell) in charge of it. This choice makes me particularly optimistic about the future, because Nat’s a good guy and I have the utmost respect for him.

This means I have no intention to move any of my public repositories away from GitHub, except if doing so would bring a substantial advantage. For instance, if there was a strong community built around medical devices software, I would consider moving glucometerutils. But this is not the case right now.

And because I still root most of my projects around my own domain, if I did move that, the canonical URL would still be valid. This is a scheme I devised after getting tired of fixing up where unieject ended up with.

Microsoft has not done anything wrong with GitHub yet. I will give them the benefit of the doubt, and not rush out of the door. It would and will be different if they were to change their policies.

Rob’s point is valid, and it would be a disgrace if various governments would push Microsoft to a corner requiring it to purge content that the smaller, independent GitHub would have left alone. But unless that happens, we’re debating hypothetical at the same level of “If I was elected supreme leader of Italy”.

So, as of today, 2018-06-04, I have no intention of moving any of my repositories to other services. I’ll also use a link to this blog with no accompanying comment to anyone who will suggest I should do so without any benefit for my projects.

June 03, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
The importance of teams, and teamwork (June 03, 2018, 21:04 UTC)

Today, on Twitter, I have received a reply with a phrase that, in its own sake and without connecting back with the original topic of the thread, I found significant of the dread I feel with working as a developer, particularly in many opensource communities nowadays.

Most things don’t work the way I think they work. That’s why I’m a programmer, so I can make them work the way I think they should work.

I’m not going to link back to the tweet, or name the author of the phrase. This is not about them in particular, and more about the feeling expressed in this phrase, which I would have agreed with many years ago, but now feels so much off key.

What I feel now is that programmers don’t make things work the way they think they should. And this is not intended as a nod to the various jokes about how bad programming actually is, given APIs and constraints. This is about something that becomes clear when you spend your time trying to change the world, or make a living alone (by running your own company): everybody needs help, in the form of a team.

A lone programmer may be able to write a whole operating system (cough Emacs), but that does not make it a success in and by itself. If you plan on changing the world, and possibly changing it for the better, you need a team that includes not only programmers, but experts in quite a lot of different things.

Whether it is a Free Software project, or a commercial product, if you want to have users, you need to know what they want — and a programmer is not always the most suitable person to go through user stories. Hands up all of us who have, at one point or another, facepalmed at an acquaintance taking a screenshot of a web page to paste it into Word, and tried to teach them how to print the page to PDF. While changing workflows so that they make sense may sound the easiest solution to most tech people, that’s not what people who are trying to just do their job care about. Particularly not if you’re trying to sell them (literally or figuratively) a new product.

And similarly to what users want to do, you need to know what the users need to do. While effectively all of Free Software comes with no warranty attached, even for it (and most definitely for commercial products), it’s important to consider the legal framework the software has to be used on. Except for the more anarchists of the developers out there, I don’t think anyone would feel particularly interested in breaching laws for the sake of breaching them, for instance by providing a ledger product that allows “black book accounting” as an encrypted parallel file. Or, to reprise my recent example, to provide a software solution that does not comply with GDPR.

This is not just about pure software products. You may remember, from last year, the teardown of Juicero. In this case the problems appeared to step by the lack of control over the BOM. While electronics is by far not my speciality, I have heard more expert friends and colleagues cringe at seeing the spec of projects that tried to actually become mainstream, with a BOM easily twice as expensive as the minimum.

Aside here, before someone starts shouting about that. Minimising the BOM for an electronic project may not always be the main target. If it’s a DIY project, making it easier to assemble could be an objective, so choosing more bulky, more expensive parts might be warranted. Similarly if it’s being done for prototyping, using more expensive but widely available components is generally a win too. I have worked on devices that used multi-GB SSDs for a firmware less than 64MB — but asking for on-board flash for the firmware would have costed more than the extremely overprovisioned SSDs.

And in my opinion, if you want to have your own company, and are in for the long run (i.e. not with startup mentality of getting VC capital and get acquired before even shipping), you definitely need someone to follow up the business plan and the accounting.

So no, I don’t think that any one programmer, or a group of sole programmers, can change the world. There’s a lot more than writing code, to build software. And a lot more than building software, to change society.

Consider this the reason why I will plonk-file any recruitment email that is looking for “rockstars” or “ninjas”. Not that I’m looking for a new gig as I type this, but I would at least give thought if someone was looking for a software mechanic (h/t @sysadmin1138).

June 01, 2018
Domen Kožar a.k.a. domen (homepage, bugs)

In the last 6 years working with Nix and mostly in last two years full-time, I've noticed a few patterns.

These are mostly direct or indirect result of not having a "good enough" infrastructure to support how much Nix has grown (1600+ contributors, 1500 pull requests per month).

Without further ado, I am announcing https://cachix.org - Binary Cache as a Service that is ready to be used after two months of work.

What problem(s) does cachix solve?

The main motivation is to save you time and compute resources waiting for your packages to build. By using a shared cache of already built packages, you'll only have to build your project once.

This should also speed up CI builds, as Nix can take use of granular caching of each package, rather than caching the whole build.

Another one (which I personally consider even more important) is decentralization of work produced by Nix developers. Up until today, most devs pushed their software updates into the nixpkgs repository, which has the global binary cache at https://cache.nixos.org.

But as the community grew, fitting different ideologies into one global namespace became impossible. I consider nixpkgs community to be mature but sometimes clash of ideologies with rational backing occurs. Some want packages to be featureful by default, some prefer them to be minimalist. Some might prefer lots of configuration knobs available (for example cross-compilation support or musl/glib swapping), some might prefer the build system to do just one thing, as it's easier to maintain.

These are not right or wrong opinions, but rather a specific view of use cases that software might or might not cover.

There are also many projects that don't fit into nixpkgs because their releases are too frequent, they are not available under permissive license, are simpler to manage over complete control or maintainers simply disagree with requirements that nixpkgs developers impose on contributors.

And that's fine. What we've learned in the past is not to fight these ideas, but allow them to co-exist in different domains.

If you're interested:

Domen (domen@enlambda.com)

May 28, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I grew up as a huge fan of comic books. Not only Italian Disney comics, which are something in by themselves, but also of US comics from Marvel. You could say that I grew up on Spider-Man and Duck Avenger. Unfortunately actually holding physical comic books nowadays is getting harder, simply because I’m travelling all the time, and I also try to keep as little physical media as I manage to, given the constraint of space of my apartment.

Digital comics are, thus, a big incentive for me to keep reading. And in particular, a few years ago I started buying my comics from Comixology, which was later bought by Amazon. The reason why I chose this particular service over others is that it allowed me to buy, and read, through a single service, the comics from Marvel, Dark Horse, Viz and a number of independent publishers. All of this sounded good to me.

I have not been reading a lot over the past few years, but as I moved to London, I found that the tube rides have the perfect span of time to catch up on the latest Spider-Man or finish up those Dresden Files graphic novels. So at some point last year I decided to get myself a second tablet, one that is easier to bring on the Tube than my neat but massive Samsung Tab A.

While Comixology is available for the Fire Tablet (being an Amazon property), I settled for the Lenovo Tab 4 8 Plus (what a mouthful!), which is a pretty neat “stock” Android tablet. Indeed, Lenovo customization of the tablet is fairly limited, and beside some clearly broken settings in the base firmware (which insisted on setting up Hangouts as SMS app, despite the tablet not having a baseband), it works quite neatly, and it has a positively long lasting battery.

The only real problem with that device is that it has very limited storage. It’s advertised as a 16GB device, but the truth is that only about half of it is available to the user. And that’s only after effectively uninstalling most of the pre-installed apps, most of which are thankfully not marked as system apps (which means you can fully uninstall them, instead of just keeping them disabled). Indeed, the more firmware updates, the fewer apps that are marked as system apps it seems — in my tablet the only three apps currently disabled are the File Manager, Gmail and Hangouts (this is a reading device, not a communication device). I can (and should) probably disable Maps, Calendar, and Photos as well, but that’s going to be for later.

Thankfully, this is not a big problem nowadays, as Android 6 introduced adoptable storage which allows you to use an additional SD cards for storage, transparently for both the system and the apps. It can be a bit slow depending on the card and the usage you make of the device, but as a reading device it works just great.

You were able to move apps to the SD card in older Android versions too, but in those cases you would end up with non-encrypted apps that would still store their data on the device’s main storage. For those cases, a number of applications, including for instance Audible (also an Amazon offering) allow you to select an external storage device to store their data files.

When I bought the tablet, SD card and installed Comixology on it, I didn’t know much about this part of Android to be honest. Indeed, I only checked if Comixology allowed storing the comics on the SD card, and since I found that was the case, I was all happy. I had adopted the SD card though, without realizing what that actually meant, though, and that was the first problem. Because then the documentation from Comixology didn’t actually match my experience: the setting to choose the SD card for storage didn’t appear, and I contacted tech support, who kept asking me questions about the device and what I was trying to do, but provided me no solution.

Instead, I noticed that everything was alright: as I adopted the SD card before installing the app, it got automatically installed on it, and it started using the card itself for storage, which allowed me to download as many comicbooks as I wanted, and not bother me at all.

Until some time earlier this year, I couldn’t update the app anymore. It kept failing with a strange Play Store error. So I decided to uninstall and reinstall it… at which point I had no way to move it back to the SD card! They disabled the option to allow the application to be moved in their manifest, and that’s why Play Store was unable to update it.

Over a month ago I contacted Comixology tech support, telling them what was going on, assuming that this was an oversight. Instead I kept getting stubborn responses that moving the app to the SD card didn’t move the comics (wrong), or insinuating I was using a rooted device (also wrong). I still haven’t managed to get them to reintroduce the movable app, even though the Kindle app, also from Amazon, moves to the SD card just fine. Ironically, you can read comics bought on Kindle Store with the Comixology app but, for some reason, not vice-versa. If I could just use the Kindle app I wouldn’t even bother with installing the Comixology app.

Now I cancelled my Comixology Unlimited subscription, cancelled my subscription to new issues of Spider-Man, Bleach, and a few other series, and am pondering what’s the best solution to my problems. I could use non-adopted storage for the tablet if I effectively dedicate it to Comixology — unfortunately in that case I won’t be able to download Google Play Books or Kindle content to the SD card as they don’t support the external storage mode. I could just read a few issues at a time, using the ~7GB storage space that I have available on the internal storage, but that’s also fairly annoying. More likely I’ll start buying the comics from another service that has a better understanding of the Android ecosystem.

Of course the issue remains that I have a lot of content on Comixology, and just a very limited subset of comics are DRM-free. This is not strictly speaking Comixology’s fault: the publishers are the one deciding whether to DRM their content or not. But it definitely shows an issue that many publishers don’t seem to grasp: in front of technical problems like this, the consumer will have better “protection” if they would have just pirated the comics!

For the moment, I can only hope that someone reading this post happens to work for, or know someone working for, Comixology or Amazon (in the product teams — I know a number of people in the Amazon production environment, but I know they are far away from the people who would be able to fix this), and they can update the Comixology app to be able to work with modern Android, so that I can resume reading all my comics easily.

Or if Amazon feels like that, I’d be okay with them giving me a Fire tablet to use in place of the Lenovo. Though I somewhat doubt that’s something they would be happy on doing.

May 25, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
The story of Gentoo management (May 25, 2018, 15:43 UTC)

I have recently made a tabular summary of (probably) all Council members and Trustees in the history of Gentoo. I think that this table provides a very succinct way of expressing the changes within management of Gentoo. While it can’t express the complete history of Gentoo, it can serve as a useful tool of reference.

What questions can it answer? For example, it provides an easy way to see how many terms individuals have served, or how long Trustee terms were. You can clearly see who served both on the Council and on the Board and when those two bodies had common members. Most notably, it collects a fair amount of hard-to-find data in a single table.

Can you trust it? I’ve put an effort to make the developer lists correct but given the bad quality of data (see below), I can’t guarantee complete correctness. The Trustee term dates are approximate at best, and oriented around elections rather than actual term (which is hard to find). Finally, I’ve merged a few short-time changes such as empty seats between resignation and appointing a replacement, as expressing them one by one made little sense and would cause the tables to grow even longer.

This article aims to be the text counterpart to the table. I would like to tell the history of the presented management bodies, explain the sources that I’ve used to get the data and the problems that I’ve found while working on it.

As you could suspect, the further back I had to go, the less good data I was able to find. The problems included the limited scope of our archives and some apparent secrecy of decision-making processes at the early time (judging by some cross-posts, the traffic on -core mailing list was significant, and it was not archived before late 2004). Both due to lack of data, and due to specific interest in developer self-government, this article starts in mid-2003.

Continue reading

May 20, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

There seems to be some serious confusion around the way directories are installed in Gentoo. In this post, I would like to shortly explain the differences between different methods of creating directories in ebuilds, and instruct how to handle the issues related to installing empty directories and volatile locations.

Empty directories are not guaranteed to be installed

First things first. The standards are pretty clear here:

Behaviour upon encountering an empty directory is undefined. Ebuilds must not attempt to install an empty directory.

PMS 13.2.2 Empty directories (EAPI 7 version)

What does that mean in practice? It means that if an empty directory is found in the installation image, it may or may not be installed. Or it may be installed, and incidentally removed later (that’s the historical Portage behavior!). In any case, you can’t rely on either behavior. If you really need a directory to exist once the package is installed, you need to make it non-empty (see: keepdir below). If you really need a directory not to exist, you need to rmdir it from the image.

That said, this behavior does makes sense. It guarantees that the Gentoo installation is secured against empty directory pruning tools.

*into

The *into family of functions is used to control install destination for other ebuild helpers. By design, either they or the respective helpers create the install directories as necessary. In other words, you do not need to call dodir when using *into.

dodir

dodir is not really special in any way. It is just a convenient wrapper for install -d that prepends ${ED} to the path. It creates an empty directory the same way the upstream build system would have created it, and if the directory is left empty, it is not guaranteed to be preserved.

So when do you use it? You use it when you need to create a directory that will not be created otherwise and that will become non-empty at the end of the build process. Example use cases are working around broken build systems (that fail due to non-existing directories but do not create them), and creating directories when you want to manually write to a file there.

src_install() {
    # build system is broken and fails
    # if ${D}/usr/bin does not exist
    dodir /usr/bin
    default

    dodir /etc/foo
    sed -e "s:@libdir@:$(get_libdir):" \
        "${FILESDIR}"/foo.conf.in \
        > "${ED}"/etc/foo/foo.conf || die
}

keepdir

keepdir is the function specifically meant for installing empty directories. It creates the directory, and a keep-file inside it. The directory becomes non-empty, and therefore guaranteed to be installed and preserved. When using keepdir, you do not call dodir as well.

Note that actually preserving the empty directories is not always necessary. Sometimes packages are perfectly capable of recreating the directories themselves. However, make sure to verify that the permissions are correct afterwards.

src_install() {
    default

    # install empty directory
    keepdir /var/lib/foo
}

Volatile locations

The keepdir method works fine for persistent locations. However, it will not work correctly in directories such as /run that are volatile or /var/cache that may be subject to wiping by user. On Gentoo, this also includes /var/run (which OpenRC maintainers unilaterally decided to turn into a /run symlink), and /var/lock.

Since the package manager does not handle recreating those directories e.g. after a reboot, something else needs to. There are three common approaches to it, most preferred first:

  1. Application creates all necessary directories at startup.
  2. tmpfiles.d file is installed to create the files at boot.
  3. Init script creates the directories before starting the service (checkpath).

The preferred approach is for applications to create those directories themselves. However, not all applications do that, and not all actually can. For example, applications that are running unprivileged generally can’t create those directories.

The second approach is to install a tmpfiles.d file to create (and maintain) the directory. Those files are work both for systemd and OpenRC users (via opentmpfiles) out of the box. The directories are (re-)created at boot, and optionally cleaned up periodically. The ebuild should also use tmpfiles.eclass to trigger directory creation after installing the package.

The third approach is to make the init script create the directory. This was the traditional way but nowadays it is generally discouraged as it causes duplication between different init systems, and the directories are not created when the application is started directly by the user.

Summary

To summarize:

  1. when you install files via *into, installation directories are automatically created for you;
  2. when you need to create a directory into which files are installed in other way than ebuild helpers, use dodir;
  3. when you need to install an empty directory in a non-volatile location (and application can’t just create it on start), use keepdir;
  4. when you need to install a directory into a volatile location (and application can’t just create it on start), use tmpfiles.d.

May 13, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
A short history of Gentoo copyright (May 13, 2018, 19:04 UTC)

As part of the recent effort into forming a new copyright policy for Gentoo, a research into the historical status has been conducted. We’ve tried to establish all the key events regarding the topic, as well as the reasoning behind the existing policy. I would like to shortly note the history based on the evidence discovered by Robin H. Johnson, Ulrich Müller and myself.

Continue reading

May 10, 2018
Sebastian Pipping a.k.a. sping (homepage, bugs)
Bash: Command output to array of lines (May 10, 2018, 12:49 UTC)

We had a case at work were multi-line output of a command should be turned into an array of lines. Here's one way to do it. Two Bash features take part with this approach:

  • $'....' syntax (a dollar right in front of a single-tick literal) activates interpolation of C-like escape sequences (see below)
  • Bash variable IFS — the internal field separator affecting the way Bash applies word splitting — is temporarily changed from default spaces-tabs-and-newlines to just newlines so that we get one array entry per line

Let me demo that:

# f() { echo $'one\ntwo  spaces' ; }

# f
one
two  spaces

# IFS=$'\n' lines=( $(f) )

# echo ${#lines[@]}
2

# echo "${lines[0]}"
one

# echo "${lines[1]}"
two  spaces

May 04, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I’m an open source developer, because I think that open source makes for safer, better software for the whole community of users. I also think that, by making more software available to a wider audience, we improve the quality, safety and security of every user out there, and as such I will always push for more, and more open, software. This is why I support the Public Money, Public Code campaign by the FSFE for opening up the software developed explicitly for public administrations.

But there is one space that I found is quite lacking when it comes with open source: business-oriented software. The first obvious thing is the lack of good accounting software, as Jonathan has written extensively about, but there is more. When I was consulting as a roaming sysadmin (or with a more buzzwordy, and marketing-friendly term, a Managed Services Provider — MSP), a number of my customers relied heavily on nearly off-the-shelf software to actually run their business. And in at least a couple of cases, they commissioned me custom-tailored software for that.

In a lot of cases, there isn’t really a good reason not to open-source this software: while it is required to run certain businesses, it is clearly not enough to run them. And yet there are very few examples of such software in the open, and that includes from me: my customers didn’t really like the idea of releasing the software to others, even after I offered a discount on the development price.

I want to show the details of an example of one such custom software, something that, to give a name to it, would be a CRM (Customer Relationship Manager), that I built for a pizzeria in Italy. I won’t be opening the source code for it (though I wish I could do so), and I won’t be showing screenshots or provide the name of the actual place, instead referring to it as Pizza Planet.

This CRM (although the name sounds more professional than what it really was), was custom-designed to suit the work environment of the pizzeria — that is to say, I did whatever they asked me, despite it disagreeing with my sense of aesthetics and engineering. The basic idea was very simple: when a customer calls, they wanted to know who the customer was even before picking up the phone — effectively inspecting the caller ID, and connecting it with the easiest database editing facility I could write, so that they could give it a name and a freeform text box to write down addresses, notes, and preferences.

The reason why they called me to write this is that they originally bought a hardware PBX (for a single room pizzeria!) just so that a laptop could connect to it and use the Address Book functionality of the vendor. Except this functionality kept crashing, and after many weeks of back-and-forth with the headquarters in Japan, the integrator could not figure out how to get it to work.

As the pizzeria was wired with ISDN (legacy technology, heh), to be able to take at least two calls at the same time, the solution I came up with was building a simple “industrial” PC, with an ISDN line card and Asterisk, get them a standard SIP phone, and write the “CRM” so that it would initiate a SIP connection to the same Asterisk server (but never answer it). Once an inbound call arrived, it would look up if there was an entry in a simple storage layer for the phone number and display it with very large fonts, to be easily readable while around the kitchen.

As things moved and changed, a second pizzeria was opened and it required a similar setup. Except that, as ISDN are legacy technology, the provider was going to charge up to the nose for connecting a new line. Instead we decided to set up a VoIP account instead, and instead of a PC to connect the software, Asterisk ran on a server (in close proximity to the VoIP provider). And since at that point the limitation of an ISDN line on open calls is limited, the scope of the project expanded.

First of all, up to four calls could be queued, “your call is very important to us”-style. We briefly discussed allowing for reserving a spot and calling back, but at the time calls to mobile phones would still be expensive enough they wanted to avoid that. Instead the calls would get a simple message telling them to wait in line to contact the pizzeria. The CRM started showing the length of the queue (in a very clunky way), although it never showed the “next call” like the customer wanted (the relationship between the customer and the VoIP provider went South, and all of us had to end up withdrawing from the engagement).

Another feature we ended up implementing was opening hours: when call would arrive outside of the advertised opening hours, an announcement would play (recorded by a paid friend, who used to act in theatre, and thus had a good pronunciation).

I’m fairly sure that none of this would actually comply with the new GDPR requirements. At the very least, the customers should be advised that their data (phone number, address) will be saved.

But why am I talking about this in the context of Open Source software? Well, while a lot of the components used in this set up were open source, or even Free Software, it still requires a lot of integration to become usable. There’s no “turnkey pizzeria setup” — you can build up the system from components, but you need not just an integrator, you need a full developer (or development team) to make sure all the components fit together.

I honestly wish I had opensourced more of this. If I was to design this again right now, I would probably make sure that there was a direct, real-time API between Asterisk and a Web-based CRM. It would definitely make it easier to secure the data for GDPR compliance. But there is more than just that: having an actual integrated, isolated system where you can make configuration changes give the user (customer) the ability to set up things without having to know how the configuration files are structured.

To set up the Asterisk, it took me a week or two reading through documentation, books on the topic, and a significant amount of experimentation with a VoIP number and a battery of testing SIM cards at home. To make the recordings work I had to fight with converting the files to G.729 beforehand, or the playback would use a significant amount of CPU.

But these are not unknown needs. There are plenty of restaurants (who don’t have to be pizza places) out there that probably need something like this. And indeed services such as Deliveroo appear to now provide a similar all-in-one solution… which is good for restaurants in cities big enough to sustain Deliveroo, but probably not grate for the smaller restaurants in smaller cities, who probably would not have much of a chance of hiring developers to make such a system themselves.

So, rambling asides, I really wish we had more ready-to-install Open Source solutions for businesses (restaurants, hotels, … — I would like to add banks to that but I know regulatory compliance is hard). I think these would actually have a very good social impact on all those towns and cities that don’t have a critical mass of tech influence, that they come with their own collection of mobile apps, for instance.

If you’re the kind of person who complains that startups only appear to want to solve problems in San Francisco, maybe think of what problems you can solve in and around your town or city.

April 21, 2018
Rafael G. Martins a.k.a. rafaelmartins (homepage, bugs)
Updates (April 21, 2018, 14:35 UTC)

Since I don't write anything here for almost 2 years, I think it is time for some short updates:

  • I left RedHat and moved to Berlin, Germany, in March 2017.
  • The series of posts about balde was stopped. The first post got a lot of Hacker News attention, and I will come back with it as soon as I can implement the required changes in the framework. Not going to happen very soon, though.
  • I've been spending most of my free time with flight simulation. You can expect a few related posts soon.
  • I left the Gentoo GSoC administration this year.
  • blogc is the only project that is currently getting some frequent attention from me, as I use it for most of my websites. Check it out! ;-)

That's all for now.

April 18, 2018
Zack Medico a.k.a. zmedico (homepage, bugs)

In portage-2.3.30, portage’s python API provides an asyncio event loop policy via a DefaultEventLoopPolicy class. For example, here’s a little program that uses portage’s DefaultEventLoopPolicy to do the same thing as emerge --regen, using an async_iter_completed function to implement the --jobs and --load-average options:

#!/usr/bin/env python

from __future__ import print_function

import argparse
import functools
import multiprocessing
import operator

import portage
from portage.util.futures.iter_completed import (
    async_iter_completed,
)
from portage.util.futures.unix_events import (
    DefaultEventLoopPolicy,
)


def handle_result(cpv, future):
    metadata = dict(zip(portage.auxdbkeys, future.result()))
    print(cpv)
    for k, v in sorted(metadata.items(),
        key=operator.itemgetter(0)):
        if v:
            print('\t{}: {}'.format(k, v))
    print()


def future_generator(repo_location, loop=None):

    portdb = portage.portdb

    for cp in portdb.cp_all(trees=[repo_location]):
        for cpv in portdb.cp_list(cp, mytree=repo_location):
            future = portdb.async_aux_get(
                cpv,
                portage.auxdbkeys,
                mytree=repo_location,
                loop=loop,
            )

            future.add_done_callback(
                functools.partial(handle_result, cpv))

            yield future


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--repo',
        action='store',
        default='gentoo',
    )
    parser.add_argument(
        '--jobs',
        action='store',
        type=int,
        default=multiprocessing.cpu_count(),
    )
    parser.add_argument(
        '--load-average',
        action='store',
        type=float,
        default=multiprocessing.cpu_count(),
    )
    args = parser.parse_args()

    try:
        repo_location = portage.settings.repositories.\
            get_location_for_name(args.repo)
    except KeyError:
        parser.error('unknown repo: {}\navailable repos: {}'.\
            format(args.repo, ' '.join(sorted(
            repo.name for repo in
            portage.settings.repositories))))

    policy = DefaultEventLoopPolicy()
    loop = policy.get_event_loop()

    try:
        for future_done_set in async_iter_completed(
            future_generator(repo_location, loop=loop),
            max_jobs=args.jobs,
            max_load=args.load_average,
            loop=loop):
            loop.run_until_complete(future_done_set)
    finally:
        loop.close()



if __name__ == '__main__':
    main()