Welcome to Gentoo Universe, an aggregation of weblog articles on all topics written by Gentoo developers. For a more refined aggregation of Gentoo-related topics only, you might be interested in Planet Gentoo.

Disclaimer:
Views expressed in the content published here do not necessarily represent the views of Gentoo Linux or the Gentoo Foundation.
   
December 06, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
Scylla Summit 2018 write-up (December 06, 2018, 22:53 UTC)

It’s been almost one month since I had the chance to attend and speak at Scylla Summit 2018 so I’m relieved to finally publish a short write-up on the key things I wanted to share about this wonderful event!

Make Scylla boring

This statement of Glauber Costa sums up what looked to me to be the main driver of the engineering efforts put into Scylla lately: making it work so consistently well on any kind of workload that it’s boring to operate 🙂

I will follow up on this statement to highlight the things I heard and (hopefully) understood during the summit. I hope you’ll find it insightful.

Reduced operational efforts

The thread-per-core and queues design still has a lot of possibilities to be leveraged.

The recent addition of RPC streaming capabilities to seastar allows a drastic reduction in the time it takes the cluster to grow or shrink (data rebalancing / resynchronization).

Incremental compaction is also very promising as this background process is one of the most expensive there is in the database’s design.

I was happy to hear that scylla-manager will soon be made available and free to use with basic features while retaining more advanced ones for enterprise version (like backup/restore).
I also noticed that the current version was not supporting SSL enabled clusters to store its configuration. So I directly asked Michał for it and I’m glad that it will be released on version 1.3.1.

Performant multi-tenancy

Why choose between real-time OLTP & analytics OLAP workloads?

The goal here is to be able to run both on the same cluster by giving users the ability to assign “SLA” shares to ROLES. That’s basically like pools on Hadoop at a much finer grain since it will create dedicated queues that will be weighted by their share.

Having one queue per usage and full accounting will allow to limit resources efficiently and users to have their say on their latency SLAs.

But Scylla also has a lot to do in the background to run smoothly. So while this design pattern was already applied to tamper compactions, a lot of work has also been done on automatic flow control and back pressure.

For instance, Materialized Views are updated asynchronously which means that while we can interact and put a lot of pressure on the table its based on (called the Main Table), we could overwhelm the background work that’s needed to keep MVs View Tables in sync. To mitigate this, a smart back pressure approach was developed and will throttle the clients to make sure that Scylla can manage to do everything at the best performance the hardware allows!

I was happy to hear that work on tiered storage is also planned to better optimize disk space costs for certain workloads.

Last but not least, columnar storage optimized for time series and analytics workloads are also something the developers are looking at.

Latency is expensive

If you care for latency, you might be happy to hear that a new polling API (named IOCB_CMD_POLL) has been contributed by Christoph Hellwig and Avi Kivity to the 4.19 Linux kernel which avoids context switching I/O by using a shared ring between kernel and userspace. Scylla will be using it by default if the kernel supports it.

The iotune utility has been upgraded since 2.3 to generate an enhanced I/O configuration.

Also, persistent (disk backed) in-memory tables are getting ready and are very promising for latency sensitive workloads!

A word on drivers

ScyllaDB has been relying on the Datastax drivers since the start. While it’s a good thing for the whole community, it’s important to note that the shard-per-CPU approach on data that Scylla is using is not known and leveraged by the current drivers.

Discussions took place and it seems that Datastax will not allow the protocol to evolve so that drivers could discover if the connected cluster is shard aware or not and then use this information to be more clever in which write/read path to use.

So for now ScyllaDB has been forking and developing their shard aware drivers for Java and Go (no Python yet… I was disappointed).

Kubernetes & containers

The ScyllaDB guys of course couldn’t avoid the Kubernetes frenzy so Moreno Garcia gave a lot of feedback and tips on how to operate Scylla on docker with minimal performance degradation.

Kubernetes has been designed for stateless applications, not stateful ones and Docker does some automatic magic that have rather big performance hits on Scylla. You will basically have to play with affinities to dedicate one Scylla instance to run on one server with a “retain” reclaim policy.

Remember that the official Scylla docker image runs with dev-mode enabled by default which turns off all performance checks on start. So start by disabling that and look at all the tips and literature that Moreno has put online!

Scylla 3.0

A lot has been written on it already so I will just be short on things that important to understand in my point of view.

  • Materialized Views do back fill the whole data set
    • this job is done by the view building process
    • you can watch its progress in the system_distributed.view_build_status table
  • Secondary Indexes are Materialized Views under the hood
    • it’s like a reverse pointer to the primary key of the Main Table
    • so if you read the whole row by selecting on the indexed column, two reads will be issued under the hood: one on the indexed MV view table to get the primary key and one on the main table to get the rest of the columns
    • so if your workload is mostly interested by the whole row, you’re better off creating a complete MV to read from than using a SI
    • this is even more true if you plan to do range scans as this double query could lead you to read from multiple nodes instead of one
  • Range scan is way more performant
    • ALLOW FILTERING finally allows a great flexibility by providing server-side filtering!

Random notes

Support for LWT (lightweight transactions) will be relying on a future implementation of the Raft consensus algorithm inside Scylla. This work will also benefits Materialized Views consistency. Duarte Nunes will be the one working on this and I envy him very much!

Support for search workloads is high in the ScyllaDB devs priorities so we should definitely hear about it in the coming months.

Support for “mc” sstables (new generation format) is done and will reduce storage requirements thanks to metadata / data compression. Migration will be transparent because Scylla can read previous formats as well so it will upgrade your sstables as it compacts them.

ScyllaDB developers have not settled on how to best implement CDC. I hope they do rather soon because it is crucial in their ability to integrate well with Kafka!

Materialized Views, Secondary Indexes and filtering will benefit from the work on partition key and indexes intersections to avoid server side filtering on the coordinator. That’s an important optimization to come!

Last but not least, I’ve had the pleasure to discuss with Takuya Asada who is the packager of Scylla for RedHat/CentOS & Debian/Ubuntu. We discussed Gentoo Linux packaging requirements as well as the recent and promising work on a relocatable package. We will collaborate more closely in the future!

November 25, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
Portability of tar features (November 25, 2018, 14:26 UTC)

The tar format is one of the oldest archive formats in use. It comes as no surprise that it is ugly — built as layers of hacks on the older format versions to overcome their limitations. However, given the POSIX standarization in late 80s and the popularity of GNU tar, you would expect the interoperability problems to be mostly resolved nowadays.

This article is directly inspired by my proof-of-concept work on new binary package format for Gentoo. My original proposal used volume label to provide user- and file(1)-friendly way of distinguish our binary packages. While it is a GNU tar extension, it falls within POSIX ustar implementation-defined file format and you would expect that non-compliant implementations would extract it as regular files. What I did not anticipate is that some implementation reject the whole archive instead.

This naturally raised more questions on how portable various tar formats actually are. To verify that, I have decided to analyze the standards for possible incompatibility dangers and build a suite of test inputs that could be used to check how various implementations cope with that. This article describes those points and provides test results for a number of implementations.

Please note that this article is focused merely on read-wise format compatibility. In other words, it establishes how tar files should be written in order to achieve best probability that it will be read correctly afterwards. It does not investigate what formats the listed tools can write and whether they can correctly create archives using specific features.

Continue reading

November 16, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

So I recently had a problem, where postgresql would run out of max concurrent connections .. and I wasn'T sure what caused it.

So to find out what the problem was I wanted to know what connections were open. After a short search I found the pg_stat_activity table.

of course most info in there is not needed for my case (it has database id, name, pid, usename, application_name, client_addr, state, ...)

but for me this was all I needed:

postgres=# select count(*), datname,state,pid from pg_stat_activity group by datname, state, pid order by datname;
 count |  datname   |        state        |  pid
-------+------------+---------------------+-------
     1 | dbmail     | idle                | 30092
     1 | dbmail     | idle                | 30095
..

or shorter just the connections by state and db

postgres=# select count(*), datname,state from pg_stat_activity group by datname, state order by datname;
 count | datname  |        state
-------+----------+---------------------
    15 | dbmail   | idle
..

of course one could go into more detail, but this made me realize that i could limit some processes that used a lot of connections, but are not heavy load. Really simple once you know where to look - as usual :)

November 13, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)

Over the year I contributed to an AV1 encoder written in rust.

Here a small tutorial about what is available right now, there is still lots to do, but I think we could enjoy more user-feedback (and possibly also some help).

Setting up

Install the rust toolchain

If you do not have rust installed, it is quite simple to get a full environment using rustup

$ curl https://sh.rustup.rs -sSf | sh
# Answer the questions asked and make sure you source the `.profile` file created.
$ source ~/.profile

Install cmake, perl and nasm

rav1e uses libaom for testing and and on x86/x86_64 some components have SIMD variants written directly using nasm.

You may follow the instructions, or just install:
nasm (version 2.13 or better)
perl (any recent perl5)
cmake (any recent version)

Once you have those dependencies in you are set.

Building rav1e

We use cargo, so the process is straightforward:

## Pull in the customized libaom if you want to run all the tests
$ git submodule update --init

## Build everything
$ cargo build --release

## Test to make sure everything works as intended
$ cargo test --features decode_test --release

## Install rav1e
$ cargo install

Using rav1e

Right now rav1e has a quite simple interface:

rav1e 0.1.0
AV1 video encoder

USAGE:
    rav1e [OPTIONS]  --output 

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -I, --keyint     Keyframe interval [default: 30]
    -l, --limit                  Maximum number of frames to encode [default: 0]
        --low_latency      low latency mode. true or false [default: true]
    -o, --output                Compressed AV1 in IVF video output
        --quantizer                 Quantizer (0-255) [default: 100]
    -r 
    -s, --speed                  Speed level (0(slow)-10(fast)) [default: 3]
        --tune                    Quality tuning (Will enforce partition sizes >= 8x8) [default: psnr]  [possible
                                        values: Psnr, Psychovisual]

ARGS:
        Uncompressed YUV4MPEG2 video input

It accepts y4m raw source and produces ivf files.

You can configure the encoder by setting the speed and quantizer levels.

The low_latency flag can be turned off to run some additional analysis over a set of frames and have additional quality gains.

Crav1e

While ave and gst-rs will use the rav1e crate directly, there are a number of software such as handbrake or vlc that would be much happier to consume a C API.

Thanks to the staticlib target and cbindgen is quite easy to produce a C-ABI library and its matching header.

Setup

crav1e is built using cargo, so nothing special is needed right now beside nasm if you are building it on x86/x86_64.

Build the library

This step is completely straightforward, you can build it as release:

$ cargo build --release

or as debug

$ cargo build

It will produce a target/release/librav1e.a or a target/debug/librav1e.a.
The C header will be in include/rav1e.h.

Try the example code

I provided a quite minimal sample case.

cc -Wall c-examples/simple_encoding.c -Ltarget/release/ -lrav1e -Iinclude/ -o c-examples/simple_encoding
./c-examples/simple_encoding

If it builds and runs correctly you are set.

Manually copy the .a and the .h

Currently cargo install does not work for our purposes, but it will change in the future.

$ cp target/release/librav1e.a /usr/local/lib
$ cp include/rav1e.h /usr/local/include/

Missing pieces

Right now crav1e works well enough but there are few shortcomings I’m trying to address.

Shared library support

The cdylib target does exist and produce a nearly usable library but there are some issues with soname support. I’m trying to address them with upstream, but it might take some time.

Meanwhile some people suggest to use patchelf or similar tools to fix the library after the fact.

Install target

cargo is generally awesome, but sadly its support for installing arbitrary files to arbitrary paths is limited, luckily there are people proposing solutions.

pkg-config file generation

I consider a library not proper if a .pc file is not provided with it.

Right now there are means to extract the information need to build a pkg-config file, but there isn’t a simple way to do it.

$ cargo rustc -- --print native-static-libs

Provides what is needed for Libs.private, ideally it should be created as part of the install step since you need to know the prefix, libdir and includedir paths.

Coming next

Probably the next blog post will be about my efforts to make cargo able to produce proper cdylib or something quite different.

PS: If somebody feels to help me with matroska in AV1 would be great 🙂

November 12, 2018
Hanno Böck a.k.a. hanno (homepage, bugs)

HackerOne is currently one of the most popular bug bounty program platforms. While the usual providers of bug bounty programs are companies, w while ago I noted that some people were running bug bounty programs on Hacker One for their private projects without payouts. It made me curious, so I decided to start one with some of my private web pages in scope.

The HackerOne process requires programs to be private at first, starting with a limited number of invites. Soon after I started the program the first reports came in. Not surprisingly I got plenty of false positives, which I tried to limit by documenting the scope better in the program description. I also got plenty of web security scanner payloads via my contact form. But more to my surprise I also got a number of very high quality reports.

S9YThis blog and two other sites in scope use Serendipity (also called S9Y), a blog software written in PHP. Through the bug bounty program I got reports for an Open Redirect, an XSS in the start page, an XSS in the back end, an SQL injection in the back end and another SQL injection in the freetag plugin. All of those were legitimate vulnerabilities in Serendipity and some of them quite severe. I forwarded the reports to the Serendipity developers.

Fixes are available by now, the first round of fixes were released with Serendipity 2.1.3 and another issue got fixed in 2.1.4. The freetag plugin was updated to version 2.69. If you use Serendipity please make sure you run the latest versions.

I'm not always happy with the way the bug bounty platforms work, yet it seems they have attracted an active community of security researchers who are also willing to occasionally look at projects without financial reward. While it's questionable when large corporations run bug bounty programs without rewards, I think that it's totally fine for private projects and volunteer-run free and open source projects.

The conclusion I take from this is that likely more projects should try to make use of the bug bounty community. Essentially Serendipity got a free security audit and is more secure now. It got this through the indirection of my personal bug bounty program, but of course this could also work directly. Free software projects could start their own bug bounty program, and when it's about web applications ideally they should have have a live installation of their own product in scope.

In case you find some security issue with my web pages I welcome reports. And special thanks to Brian Carpenter (Geeknik), Julio Cesar and oreamnos for making my blog more secure.

November 10, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
py3status v3.14 (November 10, 2018, 21:08 UTC)

I’m happy to announce this release as it contains some very interesting developments in the project. This release was focused on core changes.

IMPORTANT notice

There are now two optional dependencies to py3status:

  • gevent
    • will monkey patch the code to make it concurrent
    • the main benefit is to use an asynchronous loop instead of threads
  • pyudev
    • will enable a udev monitor if a module asks for it (only xrandr so far)
    • the benefit is described below

To install them all using pip, simply do:

pip install py3status[all]

Modules can now react/refresh on udev events

When pyudev is available, py3status will allow modules to subscribe and react to udev events!

The xrandr module uses this feature by default which allows the module to instantly refresh when you plug in or off a secondary monitor. This also allows to stop running the xrandr command in the background and saves a lot of CPU!

Highlights

  • py3status core uses black formatter
  • fix default i3status.conf detection
    • add ~/.config/i3 as a default config directory, closes #1548
    • add .config/i3/py3status in default user modules include directories
  • add markup (pango) support for modules (#1408), by @MikaYuoadas
  • py3: notify_user module name in the title (#1556), by @lasers
  • print module information to sdtout instead of stderr (#1565), by @robertnf
  • battery_level module: default to using sys instead of acpi (#1562), by @eddie-dunn
  • imap module: fix output formatting issue (#1559), by @girst

Thank you contributors!

  • eddie-dunn
  • girst
  • MikaYuoadas
  • robertnf
  • lasers
  • maximbaz
  • tobes

October 31, 2018
Arun Raghavan a.k.a. ford_prefect (homepage, bugs)
Update from the PipeWire hackfest (October 31, 2018, 15:49 UTC)

As the third and final day of the PipeWire hackfest draws to a close, I thought I’d summarise some of my thoughts on the goings-on and the future.

Thanks

Before I get into the details, I want to send out a big thank you to:

  • Christian Schaller for all the hard work of organising the event and Wim Taymans for the work on PipeWire so far (and in the future)
  • The GNOME Foundation, for sponsoring the event as a whole
  • Qualcomm, who are funding my presence at the event
  • Collabora, for sponsoring dinner on Monday
  • Everybody who attended and participate for their time and thoughtful comments

Background

For those of you who are not familiar with it, PipeWire (previously Pinos, previously PulseVideo) was Wim’s effort at providing secure, multi-program access to video devices (like webcams, or the desktop for screen capture). As he went down that rabbit hole, he wrote SPA, a lightweight general-purpose framework for representing a streaming graph, and this led to the idea of expanding the project to include support for low latency audio.

The Linux userspace audio story has, for the longest time, consisted of two top-level components: PulseAudio which handles consumer audio (power efficiency, wide range of arbitrary hardware), and JACK which deals with pro audio (low latency, high performance). Consolidating this into a good out-of-the-box experience for all use-cases has been a long-standing goal for myself and others in the community that I have spoken to.

An Opportunity

From a PulseAudio perspective, it has been hard to achieve the 1-to-few millisecond latency numbers that would be absolutely necessary for professional audio use-cases. A lot of work has gone into improving this situation, most recently with David Henningsson’s shared-ringbuffer channels that made client/server communication more efficient.

At the same time, as application sandboxing frameworks such as Flatpak have added security requirements of us that were not accounted for when PulseAudio was written. Examples including choosing which devices an application has access to (or can even know of) or which applications can act as control entities (set routing etc., enable/disable devices). Some work has gone into this — Ahmed Darwish did some key work to get memfd support in PulseAudio, and Wim has prototyped an access-control mechanism module to enable a Flatpak portal for sound.

All this said, there are still fundamental limitations in architectural decisions in PulseAudio that would require significant plumbing to address. With Wim’s work on PipeWire and his extensive background with GStreamer and PulseAudio itself, I think we have an opportunity to revisit some of those decisions with the benefit of a decade’s worth of learning deploying PulseAudio in various domains starting from desktops/laptops to phones, cars, robots, home audio, telephony systems and a lot more.

Key Ideas

There are some core ideas of PipeWire that I am quite excited about.

The first of these is the graph. Like JACK, the entities that participate in the data flow are represented by PipeWire as nodes in a graph, and routing between nodes is very flexible — you can route applications to playback devices and capture devices to applications, but you can also route applications to other applications, and this is notionally the same thing.

The second idea is a bit more radical — PipeWire itself only “runs” the graph. The actual connections between nodes are created and managed by a “session manager”. This allows us to completely separate the data flow from policy, which means we could write completely separate policy for desktop use cases vs. specific embedded use cases. I’m particularly excited to see this be scriptable in a higher-level language, which is something Bastien has already started work on!

A powerful idea in PulseAudio was rewinding — the ability to send out huge buffers to the device, but the flexibility to rewind that data when things changed (a new stream got added, or the stream moved, or the volume changed). While this is great for power saving, it is a significant amount of complexity in the code. In addition, with some filters in the data path, rewinding can break the algorithm by introducing non-linearity. PipeWire doesn’t support rewinds, and we will need to find a good way to manage latencies to account for low power use cases. One example is that we could have the session manager bump up the device latency when we know latency doesn’t matter (Android does this when the screen is off).

There are a bunch of other things that are in the process of being fleshed out, like being able to represent the hardware as a graph as well, to have a clearer idea of what is going on within a node. More updates as these things are more concrete.

The Way Forward

There is a good summary by Christian about our discussion about what is missing and how we can go about trying to make a smooth transition for PulseAudio users. There is, of course, a lot to do, and my ideal outcome is that we one day flip a switch and nobody knows that we have done so.

In practice, we’ll need to figure out how to make this transition seamless for most people, while folks with custom setup will need to be given a long runway and clear documentation to know what to do. It’s way to early to talk about this in more specifics, however.

Configuration

One key thing that PulseAudio does right (I know there are people who disagree!) is having a custom configuration that automagically works on a lot of Intel HDA-based systems. We’ve been wondering how to deal with this in PipeWire, and the path we think makes sense is to transition to ALSA UCM configuration. This is not as flexible as we need it to be, but I’d like to extend it for that purpose if possible. This would ideally also help consolidate the various methods of configuration being used by the various Linux userspaces.

To that end, I’ve started trying to get a UCM setup on my desktop that PulseAudio can use, and be functionally equivalent to what we do with our existing configuration. There are missing bits and bobs, and I’m currently focusing on the ones related to hardware volume control. I’ll write about this in the future as the effort expands out to other hardware.

Onwards and upwards

The transition to PipeWire is unlikely to be quick or completely-painless or free of contention. For those who are worried about the future, know that any switch is still a long way away. In the mean time, however, constructive feedback and comments are welcome.

October 18, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

We're happy to announce that our article "Lab::Measurement — a portable and extensible framework for controlling lab equipment and conducting measurements", describing our measurement software package Lab::Measurement, has been published in Computer Physics Communications.

Lab::Measurement is a collection of object-oriented Perl 5 modules for controlling lab instruments, performing measurements, and recording and plotting the resultant data. Its operating system independent driver stack makes it possible to use nearly identical measurement scripts both on Linux and Windows. Foreground operation with live plotting and background operation for, e.g., process control are supported. For more details, please read our article, visit the Lab::Measurement homepage, or visit Lab::Measurement on CPAN!

"Lab::Measurement - a portable and extensible framework for controlling lab equipment and conducting measurements"
S. Reinhardt, C. Butschkow, S. Geissler, A. Dirnaichner, F. Olbrich, C. Lane, D. Schröer, and A. K. Hüttel
Comp. Phys. Comm. 234, 216 (2019); arXiv:1804.03321 (PDF)

October 14, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
tryton -- ipython, proteus (October 14, 2018, 09:22 UTC)

So after being told on IRC that you can use (i)python and proteus to poke around a running tryton instance(thanks for that hint btw) I tried it and had some "fun" right away:
from proteus import config,Model
pcfg = config.set_trytond(database='trytond', config_file='/etc/tryon/trytond.conf')

gave me this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/lib64/python3.5/site-packages/trytond/backend/__init__.py in get(prop)
     31                 ep, = pkg_resources.iter_entry_points(
---> 32                     'trytond.backend', db_type)
     33             except ValueError:

ValueError: not enough values to unpack (expected 1, got 0)

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-2-300353cf02f5> in <module>()
----> 1 pcfg = config.set_trytond(database='trytond', config_file='/etc/tryon/trytond.conf')

/usr/lib64/python3.5/site-packages/proteus/config.py in set_trytond(database, user, config_file)
    281         config_file=None):
    282     'Set trytond package as backend'
--> 283     _CONFIG.current = TrytondConfig(database, user, config_file=config_file)
    284     return _CONFIG.current
    285

/usr/lib64/python3.5/site-packages/proteus/config.py in __init__(self, database, user, config_file)
    232         self.config_file = config_file
    233
--> 234         Pool.start()
    235         self.pool = Pool(database_name)
    236         self.pool.init()

/usr/lib64/python3.5/site-packages/trytond/pool.py in start(cls)
    100             for classes in Pool.classes.values():
    101                 classes.clear()
--> 102             register_classes()
    103             cls._started = True
    104

/usr/lib64/python3.5/site-packages/trytond/modules/__init__.py in register_classes()
    339     Import modules to register the classes in the Pool
    340     '''
--> 341     import trytond.ir
    342     trytond.ir.register()
    343     import trytond.res

/usr/lib64/python3.5/site-packages/trytond/ir/__init__.py in <module>()
      2 # this repository contains the full copyright notices and license terms.
      3 from ..pool import Pool
----> 4 from .configuration import *
      5 from .translation import *
      6 from .sequence import *

/usr/lib64/python3.5/site-packages/trytond/ir/configuration.py in <module>()
      1 # This file is part of Tryton.  The COPYRIGHT file at the top level of
      2 # this repository contains the full copyright notices and license terms.
----> 3 from ..model import ModelSQL, ModelSingleton, fields
      4 from ..cache import Cache
      5 from ..config import config

/usr/lib64/python3.5/site-packages/trytond/model/__init__.py in <module>()
      1 # This file is part of Tryton.  The COPYRIGHT file at the top level of
      2 # this repository contains the full copyright notices and license terms.
----> 3 from .model import Model
      4 from .modelview import ModelView
      5 from .modelstorage import ModelStorage, EvalEnvironment

/usr/lib64/python3.5/site-packages/trytond/model/model.py in <module>()
      6 from functools import total_ordering
      7
----> 8 from trytond.model import fields
      9 from trytond.error import WarningErrorMixin
     10 from trytond.pool import Pool, PoolBase

/usr/lib64/python3.5/site-packages/trytond/model/fields/__init__.py in <module>()
      2 # this repository contains the full copyright notices and license terms.
      3
----> 4 from .field import *
      5 from .boolean import *
      6 from .integer import *

/usr/lib64/python3.5/site-packages/trytond/model/fields/field.py in <module>()
     18 from ...rpc import RPC
     19
---> 20 Database = backend.get('Database')
     21
     22

/usr/lib64/python3.5/site-packages/trytond/backend/__init__.py in get(prop)
     32                     'trytond.backend', db_type)
     33             except ValueError:
---> 34                 raise exception
     35             mod_path = os.path.join(ep.dist.location,
     36                 *ep.module_name.split('.')[:-1])

/usr/lib64/python3.5/site-packages/trytond/backend/__init__.py in get(prop)
     24     if modname not in sys.modules:
     25         try:
---> 26             __import__(modname)
     27         except ImportError as exception:
     28             if not pkg_resources:

ImportError: No module named 'trytond.backend.'

Took me a while to figure out I just had a typon in the config file path. Since that cost me some time I thought I'd put it on here so that maybe someone else who makes the same mistake doesn't waste as much time on it as me ;) -- and thanks to the always helpful people on IRC #tryton@freenode

October 04, 2018

CLIP OS logo ANSSI, the National Cybersecurity Agency of France, has released the sources of CLIP OS, that aims to build a hardened, multi-level operating system, based on the Linux kernel and a lot of free and open source software. We are happy to hear that it is based on Gentoo Hardened!

September 28, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
Tryton Module Development (September 28, 2018, 12:03 UTC)

So I've finally got around to really start Tryton module dev to customize it to what we need.

I plan to put stuff that is useful as examples or maybe directly as-is on my github: https://github.com/LordVan/tryton-modules

On a side note this is trytond-4.8.4 running on python 3.5 at the moment.

The first module just (re-)adds the description filed to the sale lines in the sale module (entry). This by itself is vaguely useful for me but mostly was to figure out how this works. I have to say once figured out it is really easy - the hardest part was to get the XML right for someone who is not familiar with the structure. I'd like to thank the people who helped me on IRC ( #tryton@freenode )

The next step will be to add some custom fields to this and products.

To add this module you can follow the steps in the documentation: Tryton by example

Alexys Jacob a.k.a. ultrabug (homepage, bugs)
py3status v3.13 (September 28, 2018, 11:56 UTC)

I am once again lagging behind the release blog posts but this one is an important one.

I’m proud to announce that our long time contributor @lasers has become an official collaborator of the py3status project!

Dear @lasers, your amazing energy and overwhelming ideas have served our little community for a while. I’m sure we’ll have a great way forward as we learn to work together with @tobes 🙂 Thank you again very much for everything you do!

This release is as much dedicated to you as it is yours 🙂

IMPORTANT notice

After this release, py3status coding style CI will enforce the ‘black‘ formatter style.

Highlights

Needless to say that the changelog is huge, as usual, here is a very condensed view:

  • documentation updates, especially on the formatter (thanks @L0ric0)
  • py3 storage: use $XDG_CACHE_HOME or ~/.cache
  • formatter: multiple variable and feature fixes and enhancements
  • better config parser
  • new modules: lm_sensors, loadavg, mail, nvidia_smi, sql, timewarrior, wanda_the_fish

Thank you contributors!

  • lasers
  • tobes
  • maximbaz
  • cyrinux
  • Lorenz Steinert @L0ric0
  • wojtex
  • horgix
  • su8
  • Maikel Punie

September 27, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
New copyright policy explained (September 27, 2018, 06:47 UTC)

On 2018-09-15 meeting, the Trustees have given the final stamp of approval to the new Gentoo copyright policy outlined in GLEP 76. This policy is the result of work that has been slowly progressing since 2005, and that has taken considerable speed by the end of 2017. It is a major step forward from the status quo that has been used since the forming of Gentoo Foundation, and that mostly has been inherited from earlier Gentoo Technologies.

The policy aims to cover all copyright-related aspects, bringing Gentoo in line with the practices used in many other large open source projects. Most notably, it introduces a concept of Gentoo Certificate of Origin that requires all contributors to confirm that they are entitled to submit their contributions to Gentoo, and corrects the copyright attribution policy to be viable under more jurisdictions.

This article aims to shortly reiterate over the most important points in the new copyright policy, and provide a detailed guide on following it in Q&A form.

Continue reading

September 15, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

With Qt5 gaining support for high-DPI displays, and applications starting to exercise that support, it’s easy for applications to suddenly become unusable with some screens. For example, my old Samsung TV reported itself as 7″ screen. While this used not to really matter with websites forcing you to force the resolution of 96 DPI, the high-DPI applications started scaling themselves to occupy most of my screen, with elements becoming really huge (and ugly, apparently due to some poor scaling).

It turns out that it is really hard to find a solution for this. Most of the guides and tips are focused either on proprietary drivers or on getting custom resolutions. The DisplaySize specification in xorg.conf apparently did not change anything either. Finally, I was able to resolve the issue by overriding the EDID data for my screen. This guide explains how I did it.

Step 1: dump EDID data

Firstly, you need to get the EDID data from your monitor. Supposedly read-edid tool could be used for this purpose but it did not work for me. With only a little bit more effort, you can get it e.g. from xrandr:

$ xrandr --verbose
[...]
HDMI-0 connected primary 1920x1080+0+0 (0x57) normal (normal left inverted right x axis y axis) 708mm x 398mm
[...]
  EDID:
    00ffffffffffff004c2dfb0400000000
    2f120103804728780aee91a3544c9926
    0f5054bdef80714f8100814081809500
    950fb300a940023a801871382d40582c
    4500c48e2100001e662150b051001b30
    40703600c48e2100001e000000fd0018
    4b1a5117000a2020202020200000000a
    0053414d53554e470a20202020200143
    020323f14b901f041305140312202122
    2309070783010000e2000f67030c0010
    00b82d011d007251d01e206e285500c4
    8e2100001e011d00bc52d01e20b82855
    40c48e2100001e011d8018711c162058
    2c2500c48e2100009e011d80d0721c16
    20102c2580c48e2100009e0000000000
    00000000000000000000000000000029
[...]

If you have multiple displays connected, make sure to use the EDID for the one you’re overriding. Copy the hexdump and convert it to a binary blob. You can do this by passing it through xxd -p -r (installed by vim).

Step 2: fix screen dimensions

Once you have the EDID blob ready, you need to update the screen dimensions inside it. Initially, I did it using hex editor which involved finding all the occurrences, updating them (and manually encoding into the weird split-integers) and correcting the checksums. Then, I’ve written edid-fixdim so you wouldn’t have to repeat that experience.

First, use --get option to verify that your EDID is supported correctly:

$ edid-fixdim -g edid.bin
EDID structure: 71 cm x 40 cm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
CEA EDID found
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm

So your EDID consists of basic EDID structure, followed by one extension block. The screen dimensions are stored in 7 different blocks you’d have to update, and referenced in two checksums. The tool will take care of updating it all for you, so just pass the correct dimensions to --set:

$ edid-fixdim -s 1600x900 edid.bin
EDID structure updated to 160 cm x 90 cm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
CEA EDID found
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm

Afterwards, you can use --get again to verify that the changes were made correctly.

Step 3: overriding EDID data

Now it’s just the matter of putting the override in motion. First, make sure to enable CONFIG_DRM_LOAD_EDID_FIRMWARE in your kernel:

Device Drivers  --->
  Graphics support  --->
    Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)  --->
      [*] Allow to specify an EDID data set instead of probing for it

Then, determine the correct connector name. You can find it in dmesg output:

$ dmesg | grep -C 1 Connector
[   15.192088] [drm] ib test on ring 5 succeeded
[   15.193461] [drm] Radeon Display Connectors
[   15.193524] [drm] Connector 0:
[   15.193580] [drm]   HDMI-A-1
--
[   15.193800] [drm]     DFP1: INTERNAL_UNIPHY1
[   15.193857] [drm] Connector 1:
[   15.193911] [drm]   DVI-I-1
--
[   15.194210] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[   15.194267] [drm] Connector 2:
[   15.194322] [drm]   VGA-1

Copy the new EDID blob into location of your choice inside /lib/firmware:

$ mkdir /lib/firmware/edid
$ cp edid.bin /lib/firmware/edid/samsung.bin

Finally, add the override to your kernel command-line:

drm.edid_firmware=HDMI-A-1:edid/samsung.bin

If everything went fine, xrandr should report correct screen dimensions after next reboot, and dmesg should report that EDID override has been loaded:

$ dmesg | grep EDID
[   15.549063] [drm] Got external EDID base block and 1 extension from "edid/samsung.bin" for connector "HDMI-A-1"

If it didn't, check dmesg for error messages.

September 09, 2018
Sven Vermeulen a.k.a. swift (homepage, bugs)
cvechecker 3.9 released (September 09, 2018, 11:04 UTC)

Thanks to updates from Vignesh Jayaraman, Anton Hillebrand and Rolf Eike Beer, a new release of cvechecker is now made available.

This new release (v3.9) is a bugfix release.

September 08, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
SIP & STUN .. (September 08, 2018, 07:26 UTC)

Note to self .. it is not very useful when one leaves a (public) STUN server activated in a SIP client after changing it from using the VoIP Server's IP to the (internal) DNS .. leads to working signalling, but no audio ^^ - Took me a few days to figure out what had happened (including capturing stuff with Wireshark, ..)

September 07, 2018
Gentoo congratulates our GSoC participants (September 07, 2018, 00:00 UTC)

GSOC logo Gentoo would like to congratulate Gibix and JSteward for finishing and passing Google’s Summer of Code for the 2018 calendar year. Gibix contributed by enhancing Rust (programming language) support within Gentoo. JSteward contributed by making a full Gentoo GNU/Linux distribution, managed by Portage, run on devices which use the original Android-customized kernel.

The final reports of their projects can be reviewed on their personal blogs:

August 24, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

I have recently worked on enabling 2-step authentication via SSH on the Gentoo developer machine. I have selected google-authenticator-libpam amongst different available implementations as it seemed the best maintained and having all the necessary features, including a friendly tool for users to configure it. However, its design has a weakness: it stores the secret unprotected in user’s home directory.

This means that if an attacker manages to gain at least temporary access to the filesystem with user’s privileges — through a malicious process, vulnerability or simply because someone left the computer unattended for a minute — he can trivially read the secret and therefore clone the token source without leaving a trace. It would completely defeat the purpose of the second step, and the user may not even notice until the attacker makes real use of the stolen secret.

In order to protect against this, I’ve created google-authenticator-wrappers (as upstream decided to ignore the problem). This package provides a rather trivial setuid wrapper that manages a write-only, authentication-protected secret store for the PAM module. Additionally, it comes with a test program (so you can test the OTP setup without jumping through the hoops or risking losing access) and friendly wrappers for the default setup, as used on Gentoo Infra.

The recommended setup (as utilized by sys-auth/google-authenticator-wrappers package) is to use a dedicated user for the password store. In this scenario, the users are unable to read their secrets, and all secret operations (including authentication via the PAM module) are done using an unprivileged user. Furthermore, any operation regarding the configuration (either updating it or removing the second step) require regular PAM authentication (e.g. typing your own password).

This is consistent with e.g. how shadow operates (users can’t read their passwords, nor update them without authenticating first), how most sites using 2-factor authentication operate (again, users can’t read their secrets) and follows the RFC 6238 recommendation (that keys […] SHOULD be protected against unauthorized access and usage). It solves the aforementioned issue by preventing user-privileged processes from reading the secrets and recovery codes. Furthermore, it prevents the attacker with this particular level of access from disabling 2-step authentication, changing the secret or even weakening the configuration.

August 17, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)
Gentoo on Integricloud (August 17, 2018, 22:44 UTC)

Integricloud gave me access to their infrastructure to track some issues on ppc64 and ppc64le.

Since some of the issues are related to the compilers, I obviously installed Gentoo on it and in the process I started to fix some issues with catalyst to get a working install media, but that’s for another blogpost.

Today I’m just giving a walk-through on how to get a ppc64le (and ppc64 soon) VM up and running.

Preparation

Read this and get your install media available to your instance.

Install Media

I’m using the Gentoo installcd I’m currently refining.

Booting

You have to append console=hvc0 to your boot command, the boot process might figure it out for you on newer install medias (I still have to send patches to update livecd-tools)

Network configuration

You have to manually setup the network.
You can use ifconfig and route or ip as you like, refer to your instance setup for the parameters.

ifconfig enp0s0 ${ip}/16
route add -net default gw ${gw}
echo "nameserver 8.8.8.8" > /etc/resolv.conf
ip a add ${ip}/16 dev enp0s0
ip l set enp0s0 up
ip r add default via ${gw}
echo "nameserver 8.8.8.8" > /etc/resolv.conf

Disk Setup

OpenFirmware seems to like gpt much better:

parted /dev/sda mklabel gpt

You may use fdisk to create:
– a PowerPC PrEP boot partition of 8M
– root partition with the remaining space

Device     Start      End  Sectors Size Type
/dev/sda1   2048    18431    16384   8M PowerPC PReP boot
/dev/sda2  18432 33554654 33536223  16G Linux filesystem

I’m using btrfs and zstd-compress /usr/portage and /usr/src/.

mkfs.btrfs /dev/sda2

Initial setup

It is pretty much the usual.

mount /dev/sda2 /mnt/gentoo
cd /mnt/gentoo
wget https://dev.gentoo.org/~mattst88/ppc-stages/stage3-ppc64le-20180810.tar.xz
tar -xpf stage3-ppc64le-20180810.tar.xz
mount -o bind /dev dev
mount -t devpts devpts dev/pts
mount -t proc proc proc
mount -t sysfs sys sys
cp /etc/resolv.conf etc
chroot .

You just have to emerge grub and gentoo-sources, I diverge from the defconfig by making btrfs builtin.

My /etc/portage/make.conf:

CFLAGS="-O3 -mcpu=power9 -pipe"
# WARNING: Changing your CHOST is not something that should be done lightly.
# Please consult https://wiki.gentoo.org/wiki/Changing_the_CHOST_variable beforee
 changing.
CHOST="powerpc64le-unknown-linux-gnu"

# NOTE: This stage was built with the bindist Use flag enabled
PORTDIR="/usr/portage"
DISTDIR="/usr/portage/distfiles"
PKGDIR="/usr/portage/packages"

USE="ibm altivec vsx"

# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C
ACCEPT_KEYWORDS=~ppc64

MAKEOPTS="-j4 -l6"
EMERGE_DEFAULT_OPTS="--jobs 10 --load-average 6 "

My minimal set of packages I need before booting:

emerge grub gentoo-sources vim btrfs-progs openssh

NOTE: You want to emerge again openssh and make sure bindist is not in your USE.

Kernel & Bootloader

cd /usr/src/linux
make defconfig
make menuconfig # I want btrfs builtin so I can avoid a initrd
make -j 10 all && make install && make modules_install
grub-install /dev/sda1
grub-mkconfig -o /boot/grub/grub.cfg

NOTE: make sure you pass /dev/sda1 otherwise grub will happily assume OpenFirmware knows about btrfs and just point it to your directory.
That’s not the case unfortunately.

Networking

I’m using netifrc and I’m using the eth0-naming-convention.

touch /etc/udev/rules.d/80-net-name-slot.rules
ln -sf /etc/init.d/net.{lo,eth0}
echo -e "config_eth0=\"${ip}/16\"\nroutes_eth0="default via ${gw}\"\ndns_servers_eth0=\"8.8.8.8\"" > /etc/conf.d/net

Password and SSH

Even if the mticlient is quite nice, you would rather use ssh as much as you could.

passwd 
rc-update add sshd default

Finishing touches

Right now sysvinit does not add the hvc0 console as it should due to a profile quirk, for now check /etc/inittab and in case add:

echo 'hvc0:2345:respawn:/sbin/agetty -L 9600 hvc0' >> /etc/inittab

Add your user and add your ssh key and you are ready to use your new system!

August 15, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
new* helpers can read from stdin (August 15, 2018, 09:21 UTC)

Did you know that new* helpers can read from stdin? Well, now you know! So instead of writing to a temporary file you can install your inline text straight to the destination:

src_install() {
  # old code
  cat <<-EOF >"${T}"/mywrapper || die
    #!/bin/sh
    exec do-something --with-some-argument
  EOF
  dobin "${T}"/mywrapper

  # replacement
  newbin - mywrapper <<-EOF
    #!/bin/sh
    exec do-something --with-some-argument
  EOF
}

August 13, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

The recent efforts on improving the security of different areas of Gentoo have brought some arguments. Some time ago one of the developers has considered whether he would withstand physical violence if an attacker would use it in order to compromise Gentoo. A few days later another developer has suggested that an attacker could pay Gentoo developers to compromise the distribution. Is this a real threat to Gentoo? Are we all doomed?

Before I answer this question, let me make an important presumption. Gentoo is a community-driven open source project. As such, it has certain inherent weaknesses and there is no way around them without changing what Gentoo fundamentally is. Those weaknesses are common to all projects of the same nature.

Gentoo could indeed be compromised if developers are subject to the threat of violence to themselves or their families. As for money, I don’t want to insult anyone and I don’t think it really matters. The fact is, Gentoo is vulnerable to any adversary resourceful enough, and there are certainly both easier and cheaper ways than the two mentioned. For example, the adversary could get a new developer recruited, or simply trick one of the existing developers into compromising the distribution. It just takes one developer out of ~150.

As I said, there is no way around that without making major changes to the organizational structure of Gentoo. Those changes would probably do more harm to Gentoo than good. We can just admit that we can’t fully protect Gentoo from focused attack of a resourceful adversary, and all we can do is to limit the potential damage, detect it quickly and counteract the best we can. However, in reality random probes and script kiddie attacks that focus on trivial technical vulnerabilities are more likely, and that’s what the security efforts end up focusing on.

There seems to be some recurring confusion among Gentoo developers regarding the topic of OpenPGP key expiration dates. Some developers seem to believe them to be some kind of security measure — and start arguing about its weaknesses. Furthermore, some people seem to think of it as rotation mechanism, and believe that they are expected to generate new keys. The truth is, expiration date is neither of those.

The key expiration date can be updated at any time (both lengthened or shortened), including past the previous expiration date. This is a feature, not a bug. In fact, you are expected to update your expiration dates periodically. You certainly should not rotate your primary key unless really necessary, as switching to a new key usually involves a lot of hassle.

If an attacker manages to compromise your primary key, he can easily update the expiration date as well (even if it expires first). Therefore, expiration date does not really provide any added protection here. Revocation is the only way of dealing with compromised keys.

Expiration dates really serve two purposes: naturally eliminating unused keys, and enforcing periodical checks on the primary key. By requiring the developers to periodically update their expiration dates, we also implicitly force them to check whether their primary secret key (which we recommend storing offline, in a secure place) is still present and working. Now, if it turns out that the developer can’t neither update the expiration date nor revoke the key (because the key, its backups and the revocation certificate are all lost, damaged or the developer goes MIA), the key will eventually expire and stop being a ‘ghost’.

Even then, developers argue that we have LDAP and retirement procedures to deal with that. However, OpenPGP keys go beyond Gentoo and beyond Gentoo Infrastructure. We want to encourage good practices that will also affect our users and other people with whom developers are communicating, and who have no reason to know about internal Gentoo key management.

August 12, 2018

Pwnies logo

Congratulations to security researcher and Gentoo developer Hanno Böck and his co-authors Juraj Somorovsky and Craig Young for winning one of this year’s coveted Pwnie awards!

The award is for their work on the Return Of Bleichenbacher’s Oracle Threat or ROBOT vulnerability, which at the time of discovery affected such illustrious sites as Facebook and Paypal. Technical details can be found in the full paper published at the Cryptology ePrint Archive.

FroSCon logo

As last year, there will be a Gentoo booth again at the upcoming FrOSCon “Free and Open Source Conference” in St. Augustin near Bonn! Visitors can meet Gentoo developers to ask any question, get Gentoo swag, and prepare, configure, and compile their own Gentoo buttons.

The conference is 25th and 26th of August 2018, and there is no entry fee. See you there!

August 09, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
Inlining path_exists (August 09, 2018, 15:01 UTC)

The path_exists function in eutils was meant as a simple hack to check for existence of files matching a wildcard. However, it was kinda ugly and never became used widely. At this very moment, it is used correctly in three packages, semi-correctly in one package and totally misused in two packages. Therefore, I think it’s time to replace it with something nicer.

The replacement snippet is rather trivial (from the original consumer, eselect-opengl):

local shopt_saved=$(shopt -p nullglob)
shopt -s nullglob
local opengl_dirs=( "${EROOT%/}"/usr/lib*/opengl )
${shopt_saved}

if [[ -n ${opengl_dirs[@]} ]]; then
	# ...
fi

Through using nullglob, you disable the old POSIX default of leaving the wildcard unexpanded when it does not match anything. Instead, you either simply get an empty array or a list of matched files/directories. If your code requires at least one match, you check for the array being empty; if it handles empty argument lists just fine (e.g. for loops), you can avoid any conditionals. As a side effect, you get the expanded match in an array, so you don’t have to repeat the wildcard multiple times.

Also note using shopt directly instead of estack.eclass that is broken and does not restore options correctly. You can read more on option handling in Mangling shell options in ebuilds.

August 04, 2018
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
ptrace() and accidental boot fix on ia64 (August 04, 2018, 00:00 UTC)

trofi's blog: ptrace() and accidental boot fix on ia64

ptrace() and accidental boot fix on ia64

This story is another dive into linux kernel internals. It has started as a strace hangup on ia64 and ended up being an unusual case of gcc generating garbage code for linux kernel (not perfectly valid C either). I’ll try to cover a few ptrace() system call corners on x86_64 and ia64 for comparison.

Intro

I updated elilo and kernel on ia64 machine recently.

Kernel boot times shrunk from 10 minutes (kernel 3.14.14) down to 2 minutes (kernel 4.9.72). 3.14.14 kernel had large 8-minute pause when early console was not accessible. Every time this pause happened I thought I bricked the machine. And now delays are gone \o/

One new thing broke (so far): every time I ran strace it was hanging without any output printed. Mike Frysinger pointed out strace hangup likely related to gdb problems on ia64 reported before by Émeric Maschino.

And he was right!

Reproducing

Using ski image I booted fresh kernel to make sure the bug was still there:

# strace ls
<no response, hangup>

Yay! ski was able reproduce it: no need to torture physical machine while debugging. Next step was to find where strace got stuck. As strace and gdb are broken I had to resort to printf() debugging.

Before doing that I tried strace’s -d option to enable debug mode where it prints everything it expects from tracee process:

root@ia64 / # strace -d ls
strace: ptrace_setoptions = 0x51
strace: new tcb for pid 52, active tcbs:1
strace: [wait(0x80137f) = 52] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
strace: pid 52 has TCB_STARTUP, initializing it
strace: [wait(0x80057f) = 52] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
strace: [wait(0x00127f) = 52] WIFSTOPPED,sig=SIGCONT
strace: [wait(0x00857f) = 52] WIFSTOPPED,sig=133
????

Cryptic output. I tried to compare this output against correctly working x86_64 system to understand what went wrong:

amd64 $ strace -d ls
strace: ptrace_setoptions = 0x51
strace: new tcb for pid 29343, active tcbs:1
strace: [wait(0x80137f) = 29343] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
strace: pid 29343 has TCB_STARTUP, initializing it
strace: [wait(0x80057f) = 29343] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
strace: [wait(0x00127f) = 29343] WIFSTOPPED,sig=SIGCONT
strace: [wait(0x00857f) = 29343] WIFSTOPPED,sig=133
execve("/bin/ls", ["ls"], 0x60000fffffa4f1f8 /* 36 vars */strace: [wait(0x04057f) = 29343] WIFSTOPPED,sig=SIGTRAP,EVENT_EXEC (4)
strace: [wait(0x00857f) = 29343] WIFSTOPPED,sig=133
...

Up to execve call both logs are identical. Still no clue.

I spent some time looking at ptrace state machine in kernel and gave up trying to understand what was wrong. I then asked strace maintainer on what could be wrong and got an almost immediate response from Dmitry V. Levin: strace did not show actual error.

After a source code tweak he pointed at ptrace() syscall failure returning -EIO:

$ ./strace -d /
./strace: ptrace_setoptions = 0x51
./strace: new tcb for pid 11080, active tcbs:1
./strace: [wait(0x80137f) = 11080] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
./strace: pid 11080 has TCB_STARTUP, initializing it
./strace: [wait(0x80057f) = 11080] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
./strace: [wait(0x00127f) = 11080] WIFSTOPPED,sig=SIGCONT
./strace: [wait(0x00857f) = 11080] WIFSTOPPED,sig=133
./strace: get_regs: get_regs_error: Input/output error
????
...
"Looks like ptrace(PTRACE_GETREGS) always fails with EIO on this new kernel."

Now I got a more specific signal: ptrace(PTRACE_GETREGS,…) syscall failed.

Into the kernel

I felt I had finally found the smoking gun: getting registers of WIFSTOPPED tracee task should never fail. All registers must be already stored somewhere in memory.

Otherwise how would kernel be able to resume executing tracee task when needed?

Before diving into ia64 land let’s look into x86_64 ptrace(PTRACE_GETREGS, …) implementation.

x86_64 ptrace(PTRACE_GETREGS)

To find a <foo> syscall implementation in kernel we can search for sys_<foo>() function definition. The lazy way to find a definition is to interrogate built kernel with gdb:

$ gdb --quiet ./vmlinux
(gdb) list sys_ptrace
1105
1106    #ifndef arch_ptrace_attach
1107    #define arch_ptrace_attach(child)       do { } while (0)
1108    #endif
1109
1110    SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
1111                    unsigned long, data)
1112    {
1113            struct task_struct *child;
1114            long ret;

SYSCALL_DEFINE4(ptrace, …) macro defines actual sys_ptrace() which does a few sanity checks and dispatches to arch_ptrace():

SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
unsigned long, data)
{
// simplified a bit
struct task_struct *child;
long ret;
child = ptrace_get_task_struct(pid);
ret = arch_ptrace(child, request, addr, data);
return ret;
}

x86_64 implementation does copy_regset_to_user() call and takes a few lines of code to fetch registers:

long arch_ptrace(struct task_struct *child, long request,
unsigned long addr, unsigned long data) {
// ...
case PTRACE_GETREGS: /* Get all gp regs from the child. */
return copy_regset_to_user(child,
task_user_regset_view(current),
REGSET_GENERAL,
0, sizeof(struct user_regs_struct),
datap);

Let’s look at it in detail to get the idea where registers are normally stored.

static inline int copy_regset_to_user(struct task_struct *target,
const struct user_regset_view *view,
unsigned int setno,
unsigned int offset, unsigned int size,
void __user *data)
{
const struct user_regset *regset = &view->regsets[setno];
if (!regset->get)
return -EOPNOTSUPP;
if (!access_ok(VERIFY_WRITE, data, size))
return -EFAULT;
return regset->get(target, regset, offset, size, NULL, data);
}

Here copy_regset_to_user() is just a dispatcher to view argument. Moving on:

const struct user_regset_view *task_user_regset_view(struct task_struct *task)
{
// simplified #ifdef-ery
if (!user_64bit_mode(task_pt_regs(task)))
return &user_x86_32_view;
return &user_x86_64_view;
}
// ...
static const struct user_regset_view user_x86_64_view = {
.name = "x86_64", .e_machine = EM_X86_64,
.regsets = x86_64_regsets, .n = ARRAY_SIZE(x86_64_regsets)
};
// ...
static struct user_regset x86_64_regsets[] __ro_after_init = {
[REGSET_GENERAL] = {
.core_note_type = NT_PRSTATUS,
.n = sizeof(struct user_regs_struct) / sizeof(long),
.size = sizeof(long), .align = sizeof(long),
.get = genregs_get, .set = genregs_set
},
// ...

A bit of boilerplate to tie genregs_get() and genregs_set() to 64-bit (or 32-bit) caller. Let’s look at 64-bit variant of genregs_get() as it’s used in our PTRACE_GETREGS case:

static int genregs_get(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
if (kbuf) {
unsigned long *k = kbuf;
while (count >= sizeof(*k)) {
*k++ = getreg(target, pos);
count -= sizeof(*k);
pos += sizeof(*k);
}
} else {
unsigned long __user *u = ubuf;
while (count >= sizeof(*u)) {
if (__put_user(getreg(target, pos), u++))
return -EFAULT;
count -= sizeof(*u);
pos += sizeof(*u);
}
}
return 0;
}
// ...
static unsigned long getreg(struct task_struct *task, unsigned long offset)
{
// ... simplified
return *pt_regs_access(task_pt_regs(task), offset);
}
static unsigned long *pt_regs_access(struct pt_regs *regs, unsigned long regno)
{
BUILD_BUG_ON(offsetof(struct pt_regs, bx) != 0);
return &regs->bx + (regno >> 2);
}
// ..
#define task_pt_regs(task) \
({ \
unsigned long __ptr = (unsigned long)task_stack_page(task); \
__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
((struct pt_regs *)__ptr) - 1; \
})
static inline void *task_stack_page(const struct task_struct *task)
{
return task->stack;
}

From task_pt_regs() defnition we see that actual register contents is stored in task’s kernel stack. And genregs_get() copies register contents one by one in a while() loop.

How do task’s registers get stored to task’s kernel stack? There are a few paths to get there. Most frequent is perhaps interrupt handling when task is descheduled from CPU and is moved to scheduler wait queue.

ENTRY(interrupt_entry): is an entry point for interrupt handling.

ENTRY(interrupt_entry)
UNWIND_HINT_FUNC
ASM_CLAC
cld
testb $3, CS-ORIG_RAX+8(%rsp)
jz 1f
SWAPGS
/*
* Switch to the thread stack. The IRET frame and orig_ax are
* on the stack, as well as the return address. RDI..R12 are
* not (yet) on the stack and space has not (yet) been
* allocated for them.
*/
pushq %rdi
/* Need to switch before accessing the thread stack. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/*
* We have RDI, return address, and orig_ax on the stack on
* top of the IRET frame. That means offset=24
*/
UNWIND_HINT_IRET_REGS base=%rdi offset=24
pushq 7*8(%rdi) /* regs->ss */
pushq 6*8(%rdi) /* regs->rsp */
pushq 5*8(%rdi) /* regs->eflags */
pushq 4*8(%rdi) /* regs->cs */
pushq 3*8(%rdi) /* regs->ip */
pushq 2*8(%rdi) /* regs->orig_ax */
pushq 8(%rdi) /* return address */
UNWIND_HINT_FUNC
movq (%rdi), %rdi
1:
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
testb $3, CS+8(%rsp)
jz 1f
/*
* IRQ from user mode.
*
* We need to tell lockdep that IRQs are off. We can't do this until
* we fix gsbase, and we should do it before enter_from_user_mode
* (which can take locks). Since TRACE_IRQS_OFF is idempotent,
* the simplest way to handle it is to just call it twice if
* we enter from user mode. There's no reason to optimize this since
* TRACE_IRQS_OFF is a no-op if lockdep is off.
*/
TRACE_IRQS_OFF
CALL_enter_from_user_mode
1:
ENTER_IRQ_STACK old_rsp=%rdi save_ret=1
/* We entered an interrupt context - irqs are off: */
TRACE_IRQS_OFF
ret
END(interrupt_entry)
; ...
.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
/*
* Push registers and sanitize registers of values that a
* speculation attack might otherwise want to exploit. The
* lower registers are likely clobbered well before they
* could be put to use in a speculative execution gadget.
* Interleave XOR with PUSH for better uop scheduling:
*/
.if \save_ret
pushq %rsi /* pt_regs->si */
movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */
movq %rdi, 8(%rsp) /* pt_regs->di (overwriting original return address) */
.else
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
.endif
pushq \rdx /* pt_regs->dx */
xorl %edx, %edx /* nospec dx */
pushq %rcx /* pt_regs->cx */
xorl %ecx, %ecx /* nospec cx */
pushq \rax /* pt_regs->ax */
pushq %r8 /* pt_regs->r8 */
xorl %r8d, %r8d /* nospec r8 */
pushq %r9 /* pt_regs->r9 */
xorl %r9d, %r9d /* nospec r9 */
pushq %r10 /* pt_regs->r10 */
xorl %r10d, %r10d /* nospec r10 */
pushq %r11 /* pt_regs->r11 */
xorl %r11d, %r11d /* nospec r11*/
pushq %rbx /* pt_regs->rbx */
xorl %ebx, %ebx /* nospec rbx*/
pushq %rbp /* pt_regs->rbp */
xorl %ebp, %ebp /* nospec rbp*/
pushq %r12 /* pt_regs->r12 */
xorl %r12d, %r12d /* nospec r12*/
pushq %r13 /* pt_regs->r13 */
xorl %r13d, %r13d /* nospec r13*/
pushq %r14 /* pt_regs->r14 */
xorl %r14d, %r14d /* nospec r14*/
pushq %r15 /* pt_regs->r15 */
xorl %r15d, %r15d /* nospec r15*/
UNWIND_HINT_REGS
.if \save_ret
pushq %rsi /* return address on top of stack */
.endif
.endm

Interesting effects of the interrupt_entry are:

  • registers are backed up by PUSH_AND_CLEAR_REGS macro
  • memory area used for backup is PER_CPU_VAR(cpu_current_top_of_stack) (task’s kernel stack)

To recap: ptrace(PTRACE_GETREGS, …) does elementwise copy (using __put_user()) for each general register located in a single struct pt_regs in task’s kernel stack to tracer’s userspace.

Now let’s look at how ia64 does the same.

ia64 ptrace(PTRACE_GETREGS)

“Can’t be much more complicated than on x86_64” was my thought. Haha.

I started searching for -EIO failure in kernel and sprinkling printk() statements in ptrace() handling code.

ia64 begins with the same call path as x86_64:

Again, ptrace_getregs() is supposed to copy in-memory context back to caller’s userspace. Where did it return EIO?

Quiz: while you are skimming through the ptrace_getregs() code and comments right below, try to guess which EIO exit path is taken in our case. I’ve marked the cases with [N] numbers.

static long
ptrace_getregs (struct task_struct *child, struct pt_all_user_regs __user *ppr)
{
// ...
// [1] check if we can write back to userspace
if (!access_ok(VERIFY_WRITE, ppr, sizeof(struct pt_all_user_regs)))
return -EIO;
// [2] get pointer to register context (ok)
pt = task_pt_regs(child);
// [3] and tracee kernel stack (unexpected!)
sw = (struct switch_stack *) (child->thread.ksp + 16);
// [4] Try to unwind tracee's call chain (even more unexpected!)
unw_init_from_blocked_task(&info, child);
if (unw_unwind_to_user(&info) < 0) {
return -EIO;
}
// [5] validate alignment of target userspace buffer
if (((unsigned long) ppr & 0x7) != 0) {
dprintk("ptrace:unaligned register address %p\n", ppr);
return -EIO;
}
// [6] fetch special registers into local variables
if (access_uarea(child, PT_CR_IPSR, &psr, 0) < 0
|| access_uarea(child, PT_AR_EC, &ec, 0) < 0
|| access_uarea(child, PT_AR_LC, &lc, 0) < 0
|| access_uarea(child, PT_AR_RNAT, &rnat, 0) < 0
|| access_uarea(child, PT_AR_BSP, &bsp, 0) < 0
|| access_uarea(child, PT_CFM, &cfm, 0)
|| access_uarea(child, PT_NAT_BITS, &nat_bits, 0))
return -EIO;
/* control regs */
// [7] Finally start populating reguster contents into userspace:
retval |= __put_user(pt->cr_iip, &ppr->cr_iip);
retval |= __put_user(psr, &ppr->cr_ipsr);
/* app regs */
// [8] a few application registers
retval |= __put_user(pt->ar_pfs, &ppr->ar[PT_AUR_PFS]);
retval |= __put_user(pt->ar_rsc, &ppr->ar[PT_AUR_RSC]);
retval |= __put_user(pt->ar_bspstore, &ppr->ar[PT_AUR_BSPSTORE]);
retval |= __put_user(pt->ar_unat, &ppr->ar[PT_AUR_UNAT]);
retval |= __put_user(pt->ar_ccv, &ppr->ar[PT_AUR_CCV]);
retval |= __put_user(pt->ar_fpsr, &ppr->ar[PT_AUR_FPSR]);
retval |= __put_user(ec, &ppr->ar[PT_AUR_EC]);
retval |= __put_user(lc, &ppr->ar[PT_AUR_LC]);
retval |= __put_user(rnat, &ppr->ar[PT_AUR_RNAT]);
retval |= __put_user(bsp, &ppr->ar[PT_AUR_BSP]);
retval |= __put_user(cfm, &ppr->cfm);
/* gr1-gr3 */
// [9] normal (general) registers
retval |= __copy_to_user(&ppr->gr[1], &pt->r1, sizeof(long));
retval |= __copy_to_user(&ppr->gr[2], &pt->r2, sizeof(long) *2);
/* gr4-gr7 */
// [10] more normal (general) registers!
for (i = 4; i < 8; i++) {
if (unw_access_gr(&info, i, &val, &nat, 0) < 0)
return -EIO;
retval |= __put_user(val, &ppr->gr[i]);
}
/* gr8-gr11 */
// [11] even more normal (general) registers!!
retval |= __copy_to_user(&ppr->gr[8], &pt->r8, sizeof(long) * 4);
/* gr12-gr15 */
// [11] you've got the idea
retval |= __copy_to_user(&ppr->gr[12], &pt->r12, sizeof(long) * 2);
retval |= __copy_to_user(&ppr->gr[14], &pt->r14, sizeof(long));
retval |= __copy_to_user(&ppr->gr[15], &pt->r15, sizeof(long));
/* gr16-gr31 */
// [12] even more of those
retval |= __copy_to_user(&ppr->gr[16], &pt->r16, sizeof(long) * 16);
/* b0 */
// [13] branch register b0
retval |= __put_user(pt->b0, &ppr->br[0]);
/* b1-b5 */
// [13] more branch registers
for (i = 1; i < 6; i++) {
if (unw_access_br(&info, i, &val, 0) < 0)
return -EIO;
__put_user(val, &ppr->br[i]);
}
/* b6-b7 */
// [14] even more branch registers
retval |= __put_user(pt->b6, &ppr->br[6]);
retval |= __put_user(pt->b7, &ppr->br[7]);
/* fr2-fr5 */
// [15] floating point registers
for (i = 2; i < 6; i++) {
if (unw_get_fr(&info, i, &fpval) < 0)
return -EIO;
retval |= __copy_to_user(&ppr->fr[i], &fpval, sizeof (fpval));
}
/* fr6-fr11 */
// [16] more floating point registers
retval |= __copy_to_user(&ppr->fr[6], &pt->f6,
sizeof(struct ia64_fpreg) * 6);
/* fp scratch regs(12-15) */
// [17] more floating point registers
retval |= __copy_to_user(&ppr->fr[12], &sw->f12,
sizeof(struct ia64_fpreg) * 4);
/* fr16-fr31 */
// [18] even more floating point registers
for (i = 16; i < 32; i++) {
if (unw_get_fr(&info, i, &fpval) < 0)
return -EIO;
retval |= __copy_to_user(&ppr->fr[i], &fpval, sizeof (fpval));
}
/* fph */
// [19] rest of floating point registers
ia64_flush_fph(child);
retval |= __copy_to_user(&ppr->fr[32], &child->thread.fph,
sizeof(ppr->fr[32]) * 96);
/* preds */
// [20] predicate registers
retval |= __put_user(pt->pr, &ppr->pr);
/* nat bits */
// [20] NaT status registers
retval |= __put_user(nat_bits, &ppr->nat);
ret = retval ? -EIO : 0;
return ret;
}

It’s a huge function. Be afraid not! It has two main parts:

  • extraction of register values using unw_unwind_to_user()
  • copying extracted values to caller’s userspace using __put_user() and __copy_to_user() helpers.

Those two are a analogous of x86_64’s copy_regset_to_user() implementation.

Quiz answer: surprisingly it’s case [4]: EIO popped up due to a failure in unw_unwind_to_user() call. Or not so surprisingly given it’s The Function to fetch register values from somewhere.

Let’s check where register contents are hiding on ia64. Here goes unw_unwind_to_user() definition:

int
unw_unwind_to_user (struct unw_frame_info *info)
{
unsigned long ip, sp, pr = info->pr;
do {
unw_get_sp(info, &sp);
if ((long)((unsigned long)info->task + IA64_STK_OFFSET - sp)
< IA64_PT_REGS_SIZE) {
UNW_DPRINT(0, "unwind.%s: ran off the top of the kernel stack\n",
__func__);
break;
}
if (unw_is_intr_frame(info) &&
(pr & (1UL << PRED_USER_STACK)))
return 0;
if (unw_get_pr (info, &pr) < 0) {
unw_get_rp(info, &ip);
UNW_DPRINT(0, "unwind.%s: failed to read "
"predicate register (ip=0x%lx)\n",
__func__, ip);
return -1;
}
} while (unw_unwind(info) >= 0);
unw_get_ip(info, &ip);
UNW_DPRINT(0, "unwind.%s: failed to unwind to user-level (ip=0x%lx)\n",
__func__, ip);
return -1;
}
EXPORT_SYMBOL(unw_unwind_to_user);

The code above is more complicated than on x86_64. How is it supposed to work?

For efficiency reasons syscall interface (and even interrupt handling interface) on ia64 looks a lot more like normal function call. This means that linux does not store all general registers to a separate struct pt_regs backup area for each task switch.

Let’s peek at interrupt handling entry for completeness.

ia64 uses interrupt entrypoint to enter the kernel at ENTRY(interrupt):

ENTRY(interrupt)
/* interrupt handler has become too big to fit this area. */
br.sptk.many __interrupt
END(interrupt)
// ...
ENTRY(__interrupt)
DBG_FAULT(12)
mov r31=pr // prepare to save predicates
;;
SAVE_MIN_WITH_COVER // uses r31; defines r2 and r3
SSM_PSR_IC_AND_DEFAULT_BITS_AND_SRLZ_I(r3, r14)
// ensure everybody knows psr.ic is back on
adds r3=8,r2 // set up second base pointer for SAVE_REST
;;
SAVE_REST
;;
MCA_RECOVER_RANGE(interrupt)
alloc r14=ar.pfs,0,0,2,0 // must be first in an insn group
MOV_FROM_IVR(out0, r8) // pass cr.ivr as first arg
add out1=16,sp // pass pointer to pt_regs as second arg
;;
srlz.d // make sure we see the effect of cr.ivr
movl r14=ia64_leave_kernel
;;
mov rp=r14
br.call.sptk.many b6=ia64_handle_irq
END(__interrupt)

The code above handles interrupts as:

  • SAVE_MIN_WITH_COVER sets kernel stack (r12), gp (r1) and so on
  • SAVE_REST stores rest of registers r2 to r31 but leaves r32 to r127 be managed by RSE (register stack engine) like normal function call would.
  • Hands off control to C code in ia64_handle_irq.

All the above means that in order to get register r32 or similar we would need to perform stack kernel unwinding down to the userspace boundary and read register values from RSE memory area (backing store).

Into the rabbit hole

Back to our unwinder failure.

Our case is not very complicated as tracee is stopped at system call boundary and there is not too much to unwind. How one would know where user boundary starts? linux looks at return instruction pointer in every stack frame and checks if it’s return address still points to kernel address space.

Unwinding failure seemingly happens in depths of unw_unwind(info, &ip). From there find_save_locs(info); is called. find_save_locs() lazily builds or runs an unwind script. The run_script() is a small bytecode interpter of 11 instruction types.

If the above does not make sense to you it’s fine. It did not make sense to me either.

To get more information from unwinder I enabled debugging output for unwinder by adding #define UNW_DEBUG:

--- a/arch/ia64/kernel/unwind.c
+++ b/arch/ia64/kernel/unwind.c
@@ -56,4 +56,6 @@
#define UNW_STATS 0 /* WARNING: this disabled interrupts for long time-spans!! */
+#define UNW_DEBUG 1
+
#ifdef UNW_DEBUG
static unsigned int unw_debug_level = UNW_DEBUG;

I ran strace again:

ia64 # strace -v -d ls
strace: ptrace_setoptions = 0x51
unwind.build_script: no unwind info for ip=0xa00000010001c1a0 (prev ip=0x0)
unwind.run_script: no state->pt, dst=18, val=136
unwind.unw_unwind: failed to locate return link (ip=0xa00000010001c1a0)!
unwind.unw_unwind_to_user: failed to unwind to user-level (ip=0xa00000010001c1a0)

build_script() couldn’t resolve current ip=0xa00000010001c1a0 address. Why? No idea! I added printk() around the place where I expected a match:

--- a/arch/ia64/kernel/unwind.c
+++ b/arch/ia64/kernel/unwind.c
@@ -1562,6 +1564,8 @@ build_script (struct unw_frame_info *info)
prev = NULL;
for (table = unw.tables; table; table = table->next) {
+ UNW_DPRINT(0, "unwind.%s: looking up ip=%#lx in [start=%#lx,end=%#lx)\n",
+ __func__, ip, table->start, table->end);
if (ip >= table->start && ip < table->end) {
/*
* Leave the kernel unwind table at the very front,

I ran strace again:

ia64 # strace -v -d ls
strace: ptrace_setoptions = 0x51
unwind.build_script: looking up ip=0xa00000010001c1a0 in [start=0xa000000100009240,end=0xa000000100000000)
unwind.build_script: looking up ip=0xa00000010001c1a0 in [start=0xa000000000040720,end=0xa000000000040ad0)
unwind.build_script: no unwind info for ip=0xa00000010001c1a0 (prev ip=0x0)

Can you spot the problem? Look at this range: [start=0xa000000100009240,end=0xa000000100000000). It’s end is less than start. This renders table->start && ip < table->end condition to be always false. How could it happen?

It means the ptrace() itself is not at fault here but a victim of already corrupted table->end value.

Going deeper

To find table->end corruption I checked if table was populated correctly. It is done by a simple function init_unwind_table():

static void
init_unwind_table (struct unw_table *table, const char *name, unsigned long segment_base,
unsigned long gp, const void *table_start, const void *table_end)
{
const struct unw_table_entry *start = table_start, *end = table_end;
table->name = name;
table->segment_base = segment_base;
table->gp = gp;
table->start = segment_base + start[0].start_offset;
table->end = segment_base + end[-1].end_offset;
table->array = start;
table->length = end - start;
}

Table construction happens in only a few places:

void __init
unw_init (void)
{
extern char __gp[];
extern char __start_unwind[], __end_unwind[];
...
// Kernel's own unwind table
init_unwind_table(&unw.kernel_table, "kernel", KERNEL_START, (unsigned long) __gp,
__start_unwind, __end_unwind);
}
// ...
void *
unw_add_unwind_table (const char *name, unsigned long segment_base, unsigned long gp,
const void *table_start, const void *table_end)
{
// ...
init_unwind_table(table, name, segment_base, gp, table_start, table_end);
}
// ...
static int __init
create_gate_table (void)
{
// ...
unw_add_unwind_table("linux-gate.so", segbase, 0, start, end);
}
// ...
static void
register_unwind_table (struct module *mod)
{
// ...
mod->arch.core_unw_table = unw_add_unwind_table(mod->name, 0, mod->arch.gp,
core, core + num_core);
mod->arch.init_unw_table = unw_add_unwind_table(mod->name, 0, mod->arch.gp,
init, init + num_init);
}

Here we see unwind tables created for:

  • one table for kernel itself
  • one table linux-gate.so (equivalent of linux-vdso.so.1 on x86_64)
  • one table for each kernel module

Arrays are hard

Nothing complicated, right? Actually gcc fails to generate correct code for end[-1].end_offset expression! It happens to be a rare corner case:

Both __start_unwind and __end_unwind are defined in linker script as external symbols:

# somewhere in arch/ia64/kernel/vmlinux.lds.S
# ...
SECTIONS {
    # ...
    .IA_64.unwind : AT(ADDR(.IA_64.unwind) - LOAD_OFFSET) {
            __start_unwind = .;
            *(.IA_64.unwind*)
            __end_unwind = .;
    } :code :unwind
    # ...

Here is how C code defines __end_unwind:

extern char __end_unwind[];

If we manually inline all the above into unw_init we will get the following:

void __init
unw_init (void)
{
extern char __end_unwind[];
...
table->end = segment_base + ((unw_table_entry *)__end_unwind)[-1].end_offset;
}

If __end_unwind[] would be an array defined in C then negative index -1 would cause undefned behaviour.

On the practical side it’s just pointer arithmetics. Is there anything special about subtracting a few bytes from an arbitrary address and then dereference it?

Let’s check what kind of assembly gcc actually generates.

Compiler mysteries

Still reading? Great! You got to most exciting part of this article!

Let’s look at simpler code first. And then we will grow it to be closer to our initial example.

Let’s start from global array with a negative index:

extern long __some_table[];
long end(void) { return __some_table[-1]; }

Compilation result (I’ll strip irrelevant bits and annotations):

; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
addl r14 = @ltoffx(__some_table#), r1
;;
ld8.mov r14 = [r14], __some_table#
;;
adds r14 = -8, r14
;;
ld8 r8 = [r14]
br.ret.sptk.many b0
.endp end#

Here two things happen:

  • __some_table address is read from GOT (r1 is rougly GOT register) by performing an ld8.mov (a form of 8-byte load) into r14.
  • final value is loaded at address r14 - 18 using ld8 (also a 8-byte load).

Simple!

We can simplify the example by avoiding GOT indirection. The typical way to do it is to use __attribute__((visibility(“hidden”))) hint:

extern long __some_table[] __attribute__((visibility("hidden")));
long end(void) { return __some_table[-1]; }

Assembly code:

; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
movl r14 = @gprel(__some_table#)
;;
add r14 = r1, r14
;;
adds r14 = -8, r14
;;
ld8 r8 = [r14]
br.ret.sptk.many b0

Here movl r14 = @gprel(__some_table#) is a link-time 64-bit constant: an offset of __some_table array from r1 value. Only a single 8-byte load happens at address @gprel(__some_table#) + r1 - 8.

Also straightforward.

Now let’s change the alignment of our table from long (8 bytes on ia64) to char (1 byte):

extern char __some_table[] __attribute__((visibility("hidden")));
long end(void) { return ((long*)__some_table)[-1]; }
; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
movl r14 = @gprel(__some_table#)
;;
add r14 = r1, r14
;;
adds r19 = -7, r14
adds r16 = -8, r14
adds r18 = -6, r14
adds r17 = -5, r14
adds r21 = -4, r14
adds r15 = -3, r14
;;
ld1 r19 = [r19]
adds r20 = -2, r14
adds r14 = -1, r14
ld1 r16 = [r16]
;;
ld1 r18 = [r18]
shl r19 = r19, 8
ld1 r17 = [r17]
;;
or r19 = r16, r19
shl r18 = r18, 16
ld1 r16 = [r21]
ld1 r15 = [r15]
shl r17 = r17, 24
;;
or r18 = r19, r18
shl r16 = r16, 32
ld1 r8 = [r20]
ld1 r19 = [r14]
shl r15 = r15, 40
;;
or r17 = r18, r17
shl r14 = r8, 48
shl r8 = r19, 56
;;
or r16 = r17, r16
;;
or r15 = r16, r15
;;
.mmi
or r14 = r15, r14
;;
or r8 = r14, r8
br.ret.sptk.many b0
.endp end#

This is quite a blowup in code size! Here instead of one 8-byte ld8 load compiler generated 8 1-byte ld1 loads to assemble valid value with the help of arithmetic shifts and ors.

Note how each individual byte gets it’s personal register to keep an address and result of the load.

Here is the subset of above instructions to handle byte offset -5:

; point r14 at __some_table:
movl r14 = @gprel(__some_table#)
add r14 = r1, r14
;
; read one byte and shift it
; into destination byte position:
;
adds r17 = -5, r14
ld1 r17 = [r17]
shl r17 = r17, 24
or r16 = r17, r16

This code, while ugly and inefficient, is still correct.

Now let’s wrap our 8-byte value in a struct to make example closer to original unwinder’s table registration code:

extern char __some_table[] __attribute__((visibility("hidden")));
struct s { long v; };
long end(void) { return ((struct s *)__some_table)[-1].v; }

Quiz time: do you think generated code will be exactly the same as in previous example or somehow different?

; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
movl r14 = @gprel(__some_table#)
movl r16 = 0x1ffffffffffffff9
;;
add r14 = r1, r14
movl r15 = 0x1ffffffffffffff8
movl r17 = 0x1ffffffffffffffa
;;
add r15 = r14, r15
add r17 = r14, r17
add r16 = r14, r16
;;
ld1 r8 = [r15]
ld1 r16 = [r16]
;;
ld1 r15 = [r17]
movl r17 = 0x1ffffffffffffffb
shl r16 = r16, 8
;;
add r17 = r14, r17
or r16 = r8, r16
shl r15 = r15, 16
;;
ld1 r8 = [r17]
movl r17 = 0x1ffffffffffffffc
or r15 = r16, r15
;;
add r17 = r14, r17
shl r8 = r8, 24
;;
ld1 r16 = [r17]
movl r17 = 0x1ffffffffffffffd
or r8 = r15, r8
;;
add r17 = r14, r17
shl r16 = r16, 32
;;
ld1 r15 = [r17]
movl r17 = 0x1ffffffffffffffe
or r16 = r8, r16
;;
add r17 = r14, r17
shl r15 = r15, 40
;;
ld1 r8 = [r17]
movl r17 = 0x1fffffffffffffff
or r15 = r16, r15
;;
add r14 = r14, r17
shl r8 = r8, 48
;;
ld1 r16 = [r14]
or r15 = r15, r8
;;
shl r8 = r16, 56
;;
or r8 = r15, r8
br.ret.sptk.many b0
.endp end#

The code is different from previous one! Seemingly not too much but there one suspicious detail: offsets now are very large. Let’s look at our -5 example again:

; point r14 at __some_table:
movl r14 = @gprel(__some_table#)
add r14 = r1, r14
;
; read one byte and shift it
; into destination byte position:
;
movl r17 = 0x1ffffffffffffffb
add r17 = r14, r17
ld1 r8 = [r17]
shl r8 = r8, 24
or r8 = r15, r8
; ...

The offset 0x1ffffffffffffffb (2305843009213693947) used here is incorrect. It should have been 0xfffffffffffffffb (-5).

We encounter (arguably) a compiler bug known as PR84184. Upstream says struct handling is different enough from direct array dereferences to trick gcc into generating incorrect byte offsets.

One day I’ll take a closer look at it to understand mechanics.

Let’s explore one more example: what if we add bigger alignment to __some_table without changing it’s type?

extern char __some_table[] __attribute__((visibility("hidden"))) __attribute((aligned(8)));
struct s { long v; };
long end(void) { return ((struct s *)__some_table)[-1].v; }
; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
movl r14 = @gprel(__some_table#)
;;
add r14 = r1, r14
;;
adds r14 = -8, r14
;;
ld8 r8 = [r14]
br.ret.sptk.many b0

Exactly as our original clean and fast example: single aligned load at offset -8.

Now we have a simple workaround!

What if we pass our array in a register instead of using a global reference? (effectively uninlining array address)

struct s { long v; };
long end(char * __some_table) { return ((struct s *)__some_table)[-1].v; }
; ia64-unknown-linux-gnu-gcc-8.2.0 -O2 -S a.c
.text
.global end#
.proc end#
end:
adds r32 = -8, r32
;;
ld8 r8 = [r32]
br.ret.sptk.many b0

Also works! Note how compiler promotes alignment after a type cast from 1 to 8.

In this case a few things happen at the same time to trigger bad code generation:

  • gcc infers that char __end_unwind[] is an array literal with alignment 1
  • gcc inlines __end_unwind into init_unwind_table and demotes alignment from 8 (const struct unw_table_entry) to 1 (extern char [])
  • gcc assumes that __end_unwind can’t have negative subscript and generates invalid (and inefficient) code

Workarounds (aka hacks) time!

We can workaround corner-case conditions above in a few different ways:

Fix is still not perfect as negative subscript it used. But at least the load is aligned.

Note that void __init unw_init() is called early in kernel startup sequence even before console is initialized.

This code generation bug causes either garbage read from some memory location or kernel crash trying to access unmapped memory.

That is the strace breakage mechanics.

Parting words

  • Task switch on x86_64 and on ia64 is fun :)
  • On x86_64 implementation of ptrace(PTRACE_GETREGS, …) is very straightforward: almost a memcpy from predefined location.
  • On ia64 ptrace(PTRACE_GETREGS, …) requires many moving parts:
    • call stack unwinder for kernel (involving linker scripts to define __end_unwind and __start_unwind)
    • bytecode generator and bytecode interpreter to speedup unwinding for every ptrace() call
  • Unaligned load of register-sized value is a tricky and fragile business

Have fun!

Posted on August 4, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

August 03, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
Verifying repo/gentoo.git with gverify (August 03, 2018, 12:04 UTC)

Git commit signatures are recursive by design — that is, each signature covers not only the commit in question but also indirectly all past commits, via tree and parent commit hashes. This makes user-side commit verification much simpler, as the user needs only to verify the signature on the most recent commit; with the assumption that the developer making it has verified the earlier commit and so on. Sadly, this is usually not the case at the moment.

Most of the Gentoo developers do not really verify the base upon which they are making their commits. While they might verify the commits when pulling before starting to work on their changes, it is rather unlikely that they verify the correctness when they repeatedly need to rebase before pushing. Usually this does not cause problems as Gentoo Infrastructure is verifying the commit signatures before accepting the push. Nevertheless, the recent attack on our GitHub mirrors made me realize that if a smart attacker was able to inject a single malicious commit without valid signature, then a Gentoo developer would most likely make a signed commit on top of it without even noticing the problem.

In this article, I would like to shortly present my quick solution to this problem — app-portage/gverify. gverify is a trivial reimplementation of gkeys in <200 lines of code. It uses the gkeys seed data (yes, this means it relies on manual updates) combined with autogenerated developer keyrings to provide strict verification of commits. Unlike gkeys, it works out-of-the-box without root privileges and automatically updates the keys on use.

The package installs a gv-install tool that installs two hooks on your repo/gentoo.git working copy. Those are post-merge and pre-rebase hooks that verify the tip of upstream master branch, respectively every time merge on master is finished, and every time a rebase is about to be started. This covers the two main cases — git pull and git pull --rebase. The former causes a verbose error after the update, the latter prevents a rebase from proceeding.

While this is far from perfect, it seems reasonably good solution given the limitations of available git hooks. Most importantly, it should prevent the git pull --rebase -S && git push --sign loop from silently accepting a malicious commit. Currently the hook verifies the top upstream commit only; however, in the future I want to implement incremental verification of all new commits.

July 24, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

So I have a Django DurationField in my model, and needed to format this as HH:mm .. unfortunately django doesn't seem to support that out of the box.. after considering templatetags or writing my own filter I decided to go for a very simple alternative and just defined a method for this in my model:

    timeslot_duration = models.DurationField(null=False,
                                             blank=False,
                                             default='00:05:00',
                                             verbose_name=_('timeslot_duration'),
                                             help_text=_('[DD] [HH:[MM:]]ss[.uuuuuu] format')
                                             )

    def timeslot_duration_HHmm(self):
        sec = self.timeslot_duration.total_seconds()
        return '%02d:%02d' % (int((sec/3600)%3600), int((sec/60)%60))

that way I can do whatever I want format-wise to get exactly what I need. Not sure if this is recommended practice, or maybe frowned upon, but it works just fine.

and in my template then just use {{ <model>.timeslot_duration_HHmm }} instead of {{ <model>.timeslot_duration }}.

July 19, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)

This quick article is a wrap up for reference on how to connect to ScyllaDB using Spark 2 when authentication and SSL are enforced for the clients on the Scylla cluster.

We encountered multiple problems, even more since we distribute our workload using a YARN cluster so that our worker nodes should have everything they need to connect properly to Scylla.

We found very little help online so I hope it will serve anyone facing similar issues (that’s also why I copy/pasted them here).

The authentication part is easy going by itself and was not the source of our problems, SSL on the client side was.

Environment

  • (py)spark: 2.1.0.cloudera2
  • spark-cassandra-connector: datastax:spark-cassandra-connector: 2.0.1-s_2.11
  • python: 3.5.5
  • java: 1.8.0_144
  • scylladb: 2.1.5

SSL cipher setup

The Datastax spark cassandra driver uses default the TLS_RSA_WITH_AES_256_CBC_SHA cipher that the JVM does not support by default. This raises the following error when connecting to Scylla:

18/07/18 13:13:41 WARN channel.ChannelInitializer: Failed to initialize a channel. Closing: [id: 0x8d6f78a7]
java.lang.IllegalArgumentException: Cannot support TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers

According to the ssl documentation we have two ciphers available:

  1. TLS_RSA_WITH_AES_256_CBC_SHA
  2. TLS_RSA_WITH_AES_128_CBC_SHA

We can get get rid of the error by lowering the cipher to TLS_RSA_WITH_AES_128_CBC_SHA using the following configuration:

.config("spark.cassandra.connection.ssl.enabledAlgorithms", "TLS_RSA_WITH_AES_128_CBC_SHA")\

However, this is not really a good solution and instead we’d be inclined to use the TLS_RSA_WITH_AES_256_CBC_SHA version. For this we need to follow this Datastax’s procedure.

Then we need to deploy the JCE security jars on our all client nodes, if using YARN like us this means that you have to deploy these jars to all your NodeManager nodes.

For example by hand:

# unzip jce_policy-8.zip
# cp UnlimitedJCEPolicyJDK8/*.jar /opt/oracle-jdk-bin-1.8.0.144/jre/lib/security/

Java trust store

When connecting, the clients need to be able to validate the Scylla cluster’s self-signed CA. This is done by setting up a trustStore JKS file and providing it to the spark connector configuration (note that you protect this file with a password).

keyStore vs trustStore

In SSL handshake purpose of trustStore is to verify credentials and purpose of keyStore is to provide credentials. keyStore in Java stores private key and certificates corresponding to the public keys and is required if you are a SSL Server or SSL requires client authentication. TrustStore stores certificates from third parties or your own self-signed certificates, your application identify and validates them using this trustStore.

The spark-cassandra-connector documentation has two options to handle keyStore and trustStore.

When we did not use the trustStore option, we would get some obscure error when connecting to Scylla:

com.datastax.driver.core.exceptions.TransportException: [node/1.1.1.1:9042] Channel has been closed

When enabling DEBUG logging, we get a clearer error which indicated a failure in validating the SSL certificate provided by the Scylla server node:

Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

setting up the trustStore JKS

You need to have the self-signed CA public certificate file, then issue the following command:

# keytool -importcert -file /usr/local/share/ca-certificates/MY_SELF_SIGNED_CA.crt -keystore COMPANY_TRUSTSTORE.jks -noprompt
Enter keystore password:  
Re-enter new password: 
Certificate was added to keystore

using the trustStore

Now you need to configure spark to use the trustStore like this:

.config("spark.cassandra.connection.ssl.trustStore.password", "PASSWORD")\
.config("spark.cassandra.connection.ssl.trustStore.path", "COMPANY_TRUSTSTORE.jks")\

Spark SSL configuration example

This wraps up the SSL connection configuration used for spark.

This example uses pyspark2 and reads a table in Scylla from a YARN cluster:

$ pyspark2 --packages datastax:spark-cassandra-connector:2.0.1-s_2.11 --files COMPANY_TRUSTSTORE.jks

>>> spark = SparkSession.builder.appName("scylla_app")\
.config("spark.cassandra.auth.password", "test")\
.config("spark.cassandra.auth.username", "test")\
.config("spark.cassandra.connection.host", "node1,node2,node3")\
.config("spark.cassandra.connection.ssl.clientAuth.enabled", True)\
.config("spark.cassandra.connection.ssl.enabled", True)\
.config("spark.cassandra.connection.ssl.trustStore.password", "PASSWORD")\
.config("spark.cassandra.connection.ssl.trustStore.path", "COMPANY_TRUSTSTORE.jks")\
.config("spark.cassandra.input.split.size_in_mb", 1)\
.config("spark.yarn.queue", "scylla_queue").getOrCreate()

>>> df = spark.read.format("org.apache.spark.sql.cassandra").options(table="my_table", keyspace="test").load()
>>> df.show()

July 15, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

When playing with the thought of adding images to my books DB i thought: I need random names, and would like to scale them ..

So I looked a bit and found django-stdimage. I was pretty happy with what it could do, but the uuid4 names themselves seemed a bit .. not what I wanted .. So I came up with adding the objects pk into the filename as well.

There were some nice ways already to generate filenames, but none exactly what I wanted.

Here is my own class UploadToClassNameDirPKUUID:

class UploadToClassNameDirPKUUID(UploadToClassNameDir):
    def __call__(self, instance, filename):
        # slightly modified from the UploadToUUId class from stdimage.utils
        if instance.pk:
            self.kwargs.update({
                'name': '{}-{}'.format(instance.pk, uuid4().hex),
                })
        else:
            # no pk found so just get uuid4.hex
            self.kwargs.update({
                'name': uuid4().hex
                })
        return super().__call__(instance, filename)

Basically the same as UploadToClassNameDirUUID, but with instance.pk added in the front of the filename - this is purely convenience for me so I have the 2 pictures for my book (front&back) identifiable in the directory without looking both up in my DB. One could maybe argue it would "expose" the pk, but first in this case I do not really care as the app is not public and 2nd: anyone who can access my django-admin (which is what I use for data entry,..) would see the pk anyway so whatever ;)

July 14, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

Since I wanted to make an inventory app (that i will likely post the source of - as FOSS of course - at some point when I am done) I wanted to have a model for languages with their ISO 639-1 code.

Now the model itself is of course easy, but where to get the data to popluate it.. I was certainly not going to do that manually. After a bit of searching and talking to people on IRC I dug a bit aorund the django I18N / L10N code and found something I could use: django.conf.locale.LANG_INFO While this is without a doubt used interanally for django, I thought that would be awesome to just use that as  base for my data.

The next point was how to get the data into my DB without too much effort, but reproduceable. The first thing that came to mind was to write my own migration and populate it from there. Not somethign I particularily liked since I have a tendency to wipe my migrations and start from scratch during development and I was sure i'd delete just that one-too-many.

The other - and in my opinion better- option I found was more flexible as to when it was run and also beautifully simple: just write my own Custom Management command to do the data import for me. Using the Django documentation on custom management commands as base I got this working very quickly. Enough rambling .. here's the code:

first the model (since it was in the source data i added name_local cuz it is probably useful sometimes):

class Language(models.Model):
    '''
    List of languages by iso code (2 letter only because country code
    is not needed.
    This should be popluated by getting data from django.conf.locale.LANG_INFO
    '''
    name = models.CharField(max_length=256,
                            null=False,
                            blank=False,
                            verbose_name=_('Language name')
                            )
    name_local = models.CharField(max_length=256,
                                  null=False,
                                  blank=True,
                                  default='',
                                  verbose_name=_('Language name (in that language)'))
    isocode = models.CharField(max_length=2,
                               null=False,
                               blank=False,
                               unique=True,
                               verbose_name=_('ISO 639-1 Language code'),
                               help_text=_('2 character language code without country')
                               )
    sorting = models.PositiveIntegerField(blank=False,
                                          null=False,
                                          default=0,
                                          verbose_name=_('sorting order'),
                                          help_text=_('increase to show at top of the list')
                                          )

    def __str__(self):
        return '%s (%s)' % (self.name, self.name_local)

    class Meta:
        verbose_name = _('language')
        verbose_name_plural = _('languages')
        ordering = ('-sorting', 'name', 'isocode', )

(of course with gettext support, but if you don't need that just remove the _(...) ;)

Edit 2018-07-15: for usabilty reasons I added a sorting field so that commonly used langauges can be shown at the top of the list.

then create the folder <project>/management/commands directory and in that a file importlanguages.py

from django.core.management.base import BaseCommand, CommandError
from dcollect.models import Language
from django.conf.locale import LANG_INFO


class Command(BaseCommand):
    help = 'Imports language codes and names from django.conf.locale.LANG_INFO'

    def add_arguments(self, parser):
        pass

    def handle(self, *args, **options):
        cnt = 0
        for lang in LANG_INFO:
            if len(lang) == 2:
                #we only care about the 2 letter iso codes
                #self.stdout.write(lang + ' ' + LANG_INFO[lang]['name'] + ' ' + LANG_INFO[lang]['name_local'])
                try:
                    l = Language(isocode=lang,
                                 name=LANG_INFO[lang]['name'],
                                 name_local=LANG_INFO[lang]['name_local'])
                    l.save()
                    cnt += 1
                except Exception as e:
                    self.stdout.write('Error adding language %s' % lang)
        self.stdout.write('Added %d languages to dcollect' % cnt)

That was way easier than expected .. I initially was going to just populate 2 or 3 languages manually and leave the rest for later but that was so simple, that I just got it out of the way.

All that needs to be done now to import languages is python manage.py importlanguages - and the real nice part: no new dependencies added ;)

A little follow-up to my post about setting up tryton on Gentoo:

If you run postgresql on a different server you need to deal with setting up the permissions on the postgresql side.

What I was not aware of at that time is that trytond (not trytond-admin it seems) requires access to the template1 database too.

For some reason Trytond did silently fail to start. The only log messages I did see were of level INFO about connecting to template1:

Sat Jul 14 13:17:44 2018] INFO:trytond.backend.postgresql.database:connect to "template1"
Sat Jul 14 13:17:44 2018] INFO:werkzeug:192.168.0.151 - - [14/Jul/2018 13:17:44] "POST / HTTP/1.1" 200 -
Sat Jul 14 13:17:44 2018] INFO:werkzeug:192.168.0.151 - - [14/Jul/2018 13:17:44] "POST / HTTP/1.1" 200 -

so you need to set that up in your pg_hba.conf too. To give an example:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    trytond         trytond         192.168.0.X/0          scram-sha-256
host    template1       trytond         192.168.0.X/0          scram-sha-256

As you can see I give the trytond user access to the trytond database (and in the next line to template1 too). For some reason trytond-admin does not require this, but it would be nice if trytond did log about not getting access to "template".

Of course it needs to be set up to accept connections on the tcp/ip port you set up in postgresql.conf.

Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
tracking down mysterious memory corruption (July 14, 2018, 00:00 UTC)

trofi's blog: tracking down mysterious memory corruption

tracking down mysterious memory corruption

I’ve bought my current desktop machine around 2011 (7 years ago) and mostly had no problems with it save one exception: occasionally (once 2-3 months) firefox, liferea or gcc would mysteriously crash.

Bad PTE

dmesg reports would claim that page table entries refer to already freed physical memory:

Apr 24 03:59:17 sf kernel: BUG: Bad page map in process cc1  pte:200000000 pmd:2f9d0d067
Apr 24 03:59:17 sf kernel: addr:00000000711a7136 vm_flags:00000875 anon_vma:          (null) mapping:000000003882992c index:101a
Apr 24 03:59:17 sf kernel: file:cc1 fault:filemap_fault mmap:btrfs_file_mmap readpage:btrfs_readpage
Apr 24 03:59:18 sf kernel: CPU: 1 PID: 14834 Comm: cc1 Tainted: G         C        4.17.0-rc1-00215-g5e7c7806111a #65
Apr 24 03:59:18 sf kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F4 02/16/2012
Apr 24 03:59:18 sf kernel: Call Trace:
Apr 24 03:59:18 sf kernel:  dump_stack+0x46/0x5b
Apr 24 03:59:18 sf kernel:  print_bad_pte+0x193/0x230
Apr 24 03:59:18 sf kernel:  ? page_remove_rmap+0x216/0x330
Apr 24 03:59:18 sf kernel:  unmap_page_range+0x3f7/0x920
Apr 24 03:59:18 sf kernel:  unmap_vmas+0x47/0xa0
Apr 24 03:59:18 sf kernel:  exit_mmap+0x86/0x170
Apr 24 03:59:18 sf kernel:  mmput+0x64/0x120
Apr 24 03:59:18 sf kernel:  do_exit+0x2a9/0xb90
Apr 24 03:59:18 sf kernel:  ? syscall_trace_enter+0x16d/0x2c0
Apr 24 03:59:18 sf kernel:  do_group_exit+0x2e/0xa0
Apr 24 03:59:18 sf kernel:  __x64_sys_exit_group+0xf/0x10
Apr 24 03:59:18 sf kernel:  do_syscall_64+0x4a/0xe0
Apr 24 03:59:18 sf kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 24 03:59:18 sf kernel: RIP: 0033:0x7f7a039dcb96
Apr 24 03:59:18 sf kernel: RSP: 002b:00007fffdfa09d08 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Apr 24 03:59:18 sf kernel: RAX: ffffffffffffffda RBX: 00007f7a03ccc740 RCX: 00007f7a039dcb96
Apr 24 03:59:18 sf kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
Apr 24 03:59:18 sf kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffe70
Apr 24 03:59:18 sf kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 00007f7a03ccc740
Apr 24 03:59:18 sf kernel: R13: 0000000000000038 R14: 00007f7a03cd5608 R15: 0000000000000000
Apr 24 03:59:18 sf kernel: Disabling lock debugging due to kernel taint
Apr 24 03:59:18 sf kernel: BUG: Bad rss-counter state mm:000000004fac8a77 idx:2 val:-1

It’s not something that is easy to debug or reproduce.

Transparent Hugepages were a new thing at that time and I was using it systemwide via CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y kernel option.

After those crashes I decided to switch it back to CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y only. Crashes became more rare: once in a 5-6 months.

Enabling more debugging facilities in the kernel did not change anything and I moved on.

A few years later I set up nightly builds on this machine to build and test packages in an automatic way. Things were running smoothly except for a few memory-hungry tests that crashed once in a while: firefox, rust and webkit builds every other night hit internal compiler errors in gcc.

Crashes were very hard to isolate or reproduce: every time SIGSEGV happened on a new source file being compiled. I tried to run the same failed gcc command in a loop for hours to try to reproduce the crash but never succeeded. It is usually a strong sign of flaky hardware. At that point I tried memtest86+-5.01 and memtester tools to validate RAM chips. Tools claimed RAM to be fine. My conclusion was that crashes are the result of an obscure software problem causing memory corruption (probably in the kernel). I had no idea how to debug that and kept on using this system. For day-to-day use it was perfectly stable.

A new clue

[years later]

Last year I joined Gentoo’s toolchain@ project and started caring a bit more about glibc and gcc. dilfridge@ did a fantastic job on making glibc testsuite work on amd64 (and also many other things not directly related to this post).

One day I made a major change in how CFLAGS are handled in glibc ebuild and broke a few users with CFLAGS=-mno-sse4.2. That day I ran glibc testsuite to check if I made things worse. There was only one test failing: string/test-memmove.

Of all the obscure things that glibc checks for only one simple memmove() test refused to work!

The failure occured only on 32-bit version of glibc and looked like this:

$ elf/ld.so --inhibit-cache --library-path . string/test-memmove
simple_memmove  __memmove_ssse3_rep     __memmove_ssse3 __memmove_sse2_unaligned        __memmove_ia32
string/test-memmove: Wrong result in function __memmove_sse2_unaligned dst "0x70000084" src "0x70000000" offset "43297733"

This command runs string/test-memmove binary using ./libc.so.6 and elf/ld.so as a loader.

The good thing is that I was somewhat able to reproduce the failure: every few runs the error popped up. Test was not failing deterministically. Every time test failed it was always __memmove_sse2_unaligned but offset was different.

Here is the test source code. The test basically runs memmove() and checks if all memory was moved as expected. Originally test was written to check how memmove() handles memory ranges that span signed/unsigned address boundary around address 0x80000000. Hence the unusual mmap(addr=0x70000000, size=0x20000000) as a way to allocate memory.

Now the fun thing: the error disappeared as soon as I rebooted the machine. And came back one day later (after the usual nightly tests run). To explore the breakage and make a fix I had to find a faster way to reproduce the failure.

At that point the fastest way to make the test fail again was to run firefox build process first. It took “only” 40 minutes to get the machine in a state when I could reproduce the failure.

Once in that state I started shrinking down __memmove_sse2_unaligned implementation to check where exactly data gets transferred incorrectly. 600 lines of straightforward code is not that much.

; check if the copied block is smaller than cache size
167 cmp __x86_shared_cache_size_half, %edi
...
170 jae L(mm_large_page_loop_backward)
...
173 L(mm_main_loop_backward): ; small block, normal instruction
175 prefetcht0 -128(%eax)
...
; load 128 bits from source buffer
177 movdqu -64(%eax), %xmm0
...
; store 128 bits to destination buffer
181 movaps %xmm0, -64(%ecx)
...
244 L(mm_large_page_loop_backward):
...
; load 128 bits from source buffer
245 movdqu -64(%eax), %xmm0
...
; store 128 bits to destination avoiding cache
249 movntdq %xmm0, -64(%ecx)

Note: memcpy()’s behaviour depends on CPU cache size. When the block of copied memory is small (less than CPU cache size, 8MB in my case) memcpy() does not do anything special. Otherwise memcpy() tries to avoid cache pollution and uses non-temporal variant of store instruction: movntdq instead of usual movaps.

While I was poking at this code I found a reliable workaround to make memcpy() never fail on my machine: change movntdq to movdqa:

--- a/sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S
+++ b/sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S
@@ -26,0 +27 @@
+#define movntdq movdqa /* broken CPU? */

I was pondering if I should patch binutils locally to avoid movntdq instruction entirely but eventually discarded it and focused on finding the broken component instead. Who knows what else can be there.

I was so close!

A minimal reproducer

I attempted to craft a testcase that does not depend on glibc’s memcpy() and got this:

#include <emmintrin.h> /* movdqu, sfence, movntdq */
static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items)
{
dest += items - 1;
src += items - 1;
_mm_sfence();
for (; items != 0; items-=1, dest-=1, src-=1)
{
__m128i xmm0 = _mm_loadu_si128(src); // movdqu
if (0)
{
// this would work:
_mm_storeu_si128(dest, xmm0);// movdqu
}
else
{
// this causes single bit memory corruption
_mm_stream_si128(dest, xmm0); // movntdq
}
}
_mm_sfence();
}

This code assumes quite a few things from the caller:

  • dest > src as copying happens right-to-left
  • dest has to be 16-byte aligned
  • block size must be a multiple of 16-bytes.

Here is what C code compiles to with -O2 -m32 -msse2:

(gdb) disassemble memmove_si128u
Dump of assembler code for function memmove_si128u(__m128i_u*, __m128i_u const*, size_t):
0x000008f0 <+0>: push %ebx
0x000008f1 <+1>: lea 0xfffffff(%ecx),%ebx
0x000008f7 <+7>: shl $0x4,%ebx
0x000008fa <+10>: add %ebx,%eax
0x000008fc <+12>: add %ebx,%edx
0x000008fe <+14>: sfence
0x00000901 <+17>: test %ecx,%ecx
0x00000903 <+19>: je 0x923 <memmove_si128u(__m128i_u*, __m128i_u const*, size_t)+51>
0x00000905 <+21>: shl $0x4,%ecx
0x00000908 <+24>: mov %eax,%ebx
0x0000090a <+26>: sub %ecx,%ebx
0x0000090c <+28>: mov %ebx,%ecx
0x0000090e <+30>: xchg %ax,%ax
0x00000910 <+32>: movdqu (%edx),%xmm0
0x00000914 <+36>: sub $0x10,%eax
0x00000917 <+39>: sub $0x10,%edx
0x0000091a <+42>: movntdq %xmm0,0x10(%eax)
0x0000091f <+47>: cmp %eax,%ecx
0x00000921 <+49>: jne 0x910 <memmove_si128u(__m128i_u*, __m128i_u const*, size_t)+32>
0x00000923 <+51>: sfence
0x00000926 <+54>: pop %ebx
0x00000927 <+55>: ret

And with -O2 -m64 -mavx2:

(gdb) disassemble memmove_si128u
Dump of assembler code for function memmove_si128u(__m128i_u*, __m128i_u const*, size_t):
0x0000000000000ae0 <+0>: sfence
0x0000000000000ae3 <+3>: mov %rdx,%rax
0x0000000000000ae6 <+6>: shl $0x4,%rax
0x0000000000000aea <+10>: sub $0x10,%rax
0x0000000000000aee <+14>: add %rax,%rdi
0x0000000000000af1 <+17>: add %rax,%rsi
0x0000000000000af4 <+20>: test %rdx,%rdx
0x0000000000000af7 <+23>: je 0xb1e <memmove_si128u(__m128i_u*, __m128i_u const*, size_t)+62>
0x0000000000000af9 <+25>: shl $0x4,%rdx
0x0000000000000afd <+29>: mov %rdi,%rax
0x0000000000000b00 <+32>: sub %rdx,%rax
0x0000000000000b03 <+35>: nopl 0x0(%rax,%rax,1)
0x0000000000000b08 <+40>: vmovdqu (%rsi),%xmm0
0x0000000000000b0c <+44>: sub $0x10,%rdi
0x0000000000000b10 <+48>: sub $0x10,%rsi
0x0000000000000b14 <+52>: vmovntdq %xmm0,0x10(%rdi)
0x0000000000000b19 <+57>: cmp %rdi,%rax
0x0000000000000b1c <+60>: jne 0xb08 <memmove_si128u(__m128i_u*, __m128i_u const*, size_t)+40>
0x0000000000000b1e <+62>: sfence
0x0000000000000b21 <+65>: retq

Surprisingly (or not so surprisingly) both -m32/-m64 tests started failing on my machine.

It was always second bit of a 128-bit value that was corrupted.

On 128MB blocks this test usually caused one incorrect bit to be copied once in a few runs. I tried to run exactly the same test on other hardware I have access to. None of it failed.

I started to suspect the kernel to corrupt SSE cpu context on context switch. But why only non-temporal instruction is affected? And why only a single bit and not a full 128-bit chunk? Could it be that the kernel forgot to issue mfence on context switch and all in-flight non-temporal instructions stored garbage? That would be a sad race condition. But the single bit flip did not line up with it.

Sounds more like kernel would arbitrarily flip one bit in userspace. But why only when movntdq is involved?

I suspected CPU bug and upgraded CPU firmware, switched machine from BIOS-compatible mode to native UEFI hoping to fix it. Nope. Nothing changed. Same failure persisted: single bit corruption after a heavy load on the machine.

I started thinking on how to speed my test up to avoid firefox compilation as a trigger.

Back to square one

My suspect was bad RAM again. I modified my test all RAM by allocating 128MB chunks at a time and run memmove() on newly allocated RAM to cover all available pages. Test would either find bad memory or OOM-fail.

And bingo! It took only 30 seconds to reproduce the failure. The test usually started reporting the first problem when it got to 17GB of RAM usage.

I have 4x8GB DDR3-DIMMs. I started brute-forcing various configurations of DIMM order on motherboard slots:

A      B      A      B
DIMM-1 -      -      -      : works
DIMM-2 -      -      -      : works
DIMM-3 -      -      -      : works
DIMM-4 -      -      -      : works
DIMM-1 -      DIMM-3 -      : fails (dual channel mode)
DIMM-1 DIMM-3 -      -      : works (single channel mode)
-      DIMM-2 -      DIMM-4 : works (dual channel mode)
DIMM-3 -      DIMM-1 -      : fails (dual channel mode)
-      DIMM-3 -      DIMM-1 : fails (dual channel mode)
-      DIMM-1 -      DIMM-3 : fails (dual channel mode)
-      DIMM-2 -      DIMM-3 : fails (dual channel mode)

And many other combinations of DIMM-3 with others.

It was obvious DIMM-3 did not like team work. I booted from livecd to double-check it’s not my kernel causing all of this. The error was still there.

I bought and plugged in a new pair of RAM modules in place of DIMM-1 and DIMM-3. And had no mysterious failures since!

Time to flip CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y back on :)

Speculations and open questions

It seems that dual-channel mode and cache coherency has something to do with it. A few thoughs:

  1. Single DDR3-DIMM can perform only 64-bit wide loads and stores.
  2. In dual-channel mode two 64-bit wide stores can happen at a time and require presence of two DIMMs.
  3. movntdq stores directly into RAM possibly evicting existing value from cache. That can cause further writeback to RAM to free dirty cache line.
  4. movdqa stores to cache. But eventually cache pressure will also trigger store back to RAM in chunks of cache line size of Last Line Cache (64-bytes=512-bits for me). Why do we not see corruption happening in this case?

It feels like there should be not much difference between non-temporal and normal instructions in terms of size of data being written at a time over memory bus. What likely changes is access sequence of physical addresses under two workloads. But I don’t know how to look into it in detail.

Mystery!

Parting words

  • This crash took me 7 years to figure out :)
  • Fix didn’t require a single line of code :)
  • Bad RAM happens. Even if memtest86+-5.01 disagrees.
  • As I was running memtest86+ in qemu I found a bunch of unrelated bugs in tianocore implementation of UEFI and memtest86+ gentoo ebuild: hybrid ISO is not recognized as an ISO at all. memtest86+ crashes at statrup for yet unknown reason (likely needs to be fixed against newer toolchain).
  • non-temporal instructions are a thing and have their own memory I/O engine.
  • C-level wrappers around SSE and AVX instructions are easy to use!

Have fun!

Posted on July 14, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

July 11, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

Just cuz it took me about 10 minutes to figure outwhy my deployed Django site was just giving me Bad Request (400)

After turning DEBUG on in settings.py I found out the reason: I had forgotten to set the permissions, so that my webserver-user had access to the files needed.

Simple to fix, but can be a pain to find out what happened to a perfectly working site.

July 09, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

Today's news is that we have submitted a manuscript for publication, describing Lab::Measurement and with it our approach towards fast, flexible, and platform-independent measuring with Perl! The manuscript mainly focuses on the new, Moose-based class hierarchy. We have uploaded it to arXiv as well; here is the (for now) full bibliographic information of the preprint:

 "Lab::Measurement - a portable and extensible framework for controlling lab equipment and conducting measurements"
S. Reinhardt, C. Butschkow, S. Geissler, A. Dirnaichner, F. Olbrich, C. Lane, D. Schröer, and A. K. Hüttel
submitted for publication; arXiv:1804.03321 (PDF, BibTeX entry)
If you're using Lab::Measurement in your lab, and this results in some nice publication, then we'd be very grateful for a citation of our work - for now the preprint, and later hopefully the accepted version.

July 06, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
A botspot story (July 06, 2018, 14:50 UTC)

I felt like sharing a recent story that allowed us identify a bot in a haystack thanks to Scylla.

 

The scenario

While working on loading up 2B+ of rows into Scylla from Hive (using Spark), we noticed a strange behaviour in the performances of one of our nodes:

 

So we started wondering why that server in blue was having those peaks of load and was clearly diverging from the two others… As we obviously expect the three nodes to behave the same, there were two options on the table:

  1. hardware problem on the node
  2. bad data distribution (bad schema design? consistent hash problem?)

We shared this with our pals from ScyllaDB and started working on finding out what was going on.

The investigation

Hardware?

Hardware problem was pretty quickly evicted, nothing showed up on the monitoring and on the kernel logs. I/O queues and throughput were good:

Data distribution?

Avi Kivity (ScyllaDB’s CTO) quickly got the feeling that something was wrong with the data distribution and that we could be facing a hotspot situation. He quickly nailed it down to shard 44 thanks to the scylla-grafana-monitoring platform.

Data is distributed between shards that are stored on nodes (consistent hash ring). This distribution is done by hashing the primary key of your data which dictates the shard it belongs to (and thus the node(s) where the shard is stored).

If one of your keys is over represented in your original data set, then the shard it belongs to can be overly populated and the related node overloaded. This is called a hotspot situation.

tracing queries

The first step was to trace queries in Scylla to try to get deeper into the hotspot analysis. So we enabled tracing using the following formula to get about 1 trace per second in the system_traces namespace.

tracing probability = 1 / expected requests per second throughput

In our case, we were doing between 90K req/s and 150K req/s so we settled for 100K req/s to be safe and enabled tracing on our nodes like this:

# nodetool settraceprobability 0.00001

Turns out tracing didn’t help very much in our case because the traces do not include the query parameters in Scylla 2.1, it is becoming available in the soon to be released 2.2 version.

NOTE: traces expire on the tables, make sure your TRUNCATE the events and sessions tables while iterating. Else you will have to wait for the next gc_grace_period (10 days by default) before they are actually removed. If you do not do that and generate millions of traces like we did, querying the mentioned tables will likely time out because of the “tombstoned” rows even if there is no trace inside any more.

looking at cfhistograms

Glauber Costa was also helping on the case and got us looking at the cfhistograms of the tables we were pushing data to. That proved to be clearly highlighting a hotspot problem:

histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                             (micros)          (micros)           (bytes)                  
50%             0,00              6,00              0,00               258                 2
75%             0,00              6,00              0,00               535                 5
95%             0,00              8,00              0,00              1916                24
98%             0,00             11,72              0,00              3311                50
99%             0,00             28,46              0,00              5722                72
Min             0,00              2,00              0,00               104                 0
Max             0,00          45359,00              0,00          14530764            182785

What this basically means is that 99% percentile of our partitions are small (5KB) while the biggest is 14MB! That’s a huge difference and clearly shows that we have a hotspot on a partition somewhere.

So now we know for sure that we have an over represented key in our data set, but what key is it and why?

The culprit

So we looked at the cardinality of our data set keys which are SHA256 hashes and found out that indeed we had one with more than 1M occurrences while the second highest one was around 100K!…

Now that we had the main culprit hash, we turned to our data streaming pipeline to figure out what kind of event was generating the data associated to the given SHA256 hash… and surprise! It was a client’s quality assurance bot that was constantly browsing their own website with legitimate behaviour and identity credentials associated to it.

So we modified our pipeline to detect this bot and discard its events so that it stops polluting our databases with fake data. Then we cleaned up the million of events worth of mess and traces we stored about the bot.

The aftermath

Finally, we cleared out the data in Scylla and tried again from scratch. Needless to say that the curves got way better and are exactly what we should expect from a well balanced cluster:

Thanks a lot to the ScyllaDB team for their thorough help and high spirited support!

I’ll quote them conclude this quick blog post:

Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
Django (2.0) ForeignKey -> ManyToManyField (July 06, 2018, 12:07 UTC)

So I just thought my small URL Todo django app (which I will poste the source of online soon - when I have some time to write up some README,.. too) could benefit from each urlTodo being able to be part of more than one category.

Now I already had some data in it so I was unsure. On IRC I was advised it might or might not work with automatic migrations. So I tested it.

Turns out it works ok, but does loose the assignement of the category (which was not an issue since  I did not have that much data in it yet).

so all I had to do was change

category = models.ForeignKey(Category,
                             on_delete=models.PROTECT,
                             related_name='urltodos',
                             null=False,
                             verbose_name=_('category'))

to

category = models.ManyToManyField(Category,
                                  related_name='urltodos',
                                  verbose_name=_('category'))

and run ./manage.py makemigrations and ./manage.py migrate and then assign the categories that got lost agian.

Again I am impressed how easily this could be done with django!

July 02, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)
Altivec and VSX in Rust (part 1) (July 02, 2018, 11:05 UTC)

I’m involved in implementing the Altivec and VSX support on rust stdsimd.

Supporting all the instructions in this language is a HUGE endeavor since for each instruction at least 2 tests have to be written and making functions type-generic gets you to the point of having few pages of implementation (that luckily desugars to the single right instruction and nothing else).

Since I’m doing this mainly for my multimedia needs I have a short list of instructions I find handy to get some code written immediately and today I’ll talk a bit about some of them.

This post is inspired by what Luc did for neon, but I’m using rust instead.

If other people find it useful, I’ll try to write down the remaining istructions.

Permutations

Most if not all the SIMD ISAs have at least one or multiple instructions to shuffle vector elements within a vector or among two.

It is quite common to use those instructions to implement matrix transposes, but it isn’t its only use.

In my toolbox I put vec_perm and vec_xxpermdi since even if the portable stdsimd provides some shuffle support it is quite unwieldy compared to the Altivec native offering.

vec_perm: Vector Permute

Since it first iteration Altivec had a quite amazing instruction called vec_perm or vperm:

    fn vec_perm(a: i8x16, b: i8x16, c: i8x16) -> i8x16 {
        let mut d;
        for i in 0..16 {
            let idx = c[i] & 0xf;
            d[i] = if (c[i] & 0x10) == 0 {
                a[idx]
            } else {
                b[idx]
            };
        }
        d
    }

It is important to notice that the displacement map c is a vector and not a constant. That gives you quite a bit of flexibility in a number of situations.

This instruction is the building block you can use to implement a large deal of common patterns, including some that are also covered by stand-alone instructions e.g.:
– packing/unpacking across lanes as long you do not have to saturate: vec_pack, vec_unpackh/vec_unpackl
– interleave/merge two vectors: vec_mergel, vec_mergeh
– shift N bytes in a vector from another: vec_sld

It can be important to recall this since you could always take two permutations and make one, vec_perm itself is pretty fast and replacing two or more instructions with a single permute can get you a pretty neat speed boost.

vec_xxpermdi Vector Permute Doubleword Immediate

Among a good deal of improvements VSX introduced a number of instructions that work on 64bit-elements vectors, among those we have a permute instruction and I found myself using it a lot.

    #[rustc_args_required_const(2)]
    fn vec_xxpermdi(a: i64x2, b: i64x2, c: u8) -> i64x2 {
        match c & 0b11 {
            0b00 => i64x2::new(a[0], b[0]);
            0b01 => i64x2::new(a[1], b[0]);
            0b10 => i64x2::new(a[0], b[1]);
            0b11 => i64x2::new(a[1], b[1]);
        }
    }

This instruction is surely less flexible than the previous permute but it does not require an additional load.

When working on video codecs is quite common to deal with blocks of pixels that go from 4×4 up to 64×64, before vec_xxpermdi the common pattern was:

    #[inline(always)]
    fn store8(dst: &mut [u8x16], v: &[u8x16]) {
        let data = dst[i];
        dst[i] = vec_perm(v, data, TAKE_THE_FIRST_8);
    }

That implies to load the mask as often as needed as long as the destination.

Using vec_xxpermdi avoids the mask load and that usually leads to a quite significative speedup when the actual function is tiny.

Mixed Arithmetics

With mixed arithmetics I consider all the instructions that do in a single step multiple vector arithmetics.

The original altivec has the following operations available for the integer types:
vec_madds
vec_mladd
vec_mradds
vec_msum
vec_msums
vec_sum2s
vec_sum4s
vec_sums

And the following two for the float type:
vec_madd
vec_nmsub

All of them are quite useful and they will all find their way in stdsimd pretty soon.

I’m describing today vec_sums, vec_msums and vec_madds.

They are quite representative and the other instructions are similar in spirit:
vec_madds, vec_mladd and vec_mradds all compute a lane-wise product, take either the high-order or the low-order part of it
and add a third vector returning a vector of the same element size.
vec_sums, vec_sum2s and vec_sum4s all combine an in-vector sum operation with a sum with another vector.
vec_msum and vec_msums both compute a sum of products, the intermediates are added together and then added to a wider-element
vector.

If there is enough interest and time I can extend this post to cover all of them, for today we’ll go with this approximation.

vec_sums: Vector Sum Saturated

Usually SIMD instruction work with two (or 3) vectors and execute the same operation for each vector element.
Sometimes you want to just do operations within the single vector and vec_sums is one of the few instructions that let you do that:

    fn vec_sums(a: i32x4, b: i32x4) -> i32x4 {
        let d = i32x4::new(0, 0, 0, 0);

        d[3] = b[3].saturating_add(a[0]).saturating_add(a[1]).saturating_add(a[2]).saturating_add(a[3]);

        d
    }

It returns in the last element of the vector the sum of the vector elements of a and the last element of b.
It is pretty handy when you need to compute an average or similar operations.

It works only with 32bit signed element vectors.

vec_msums: Vector Multiply Sum Saturated

This instruction sums the 32bit element of the third vector with the two products of the respective 16bit
elements of the first two vectors overlapping the element.

It does quite a bit:

    fn vmsumshs(a: i16x8, b: i16x8, c: i32x4) -> i32x4 {
        let d;
        for i in 0..4 {
            let idx = 2 * i;
            let m0 = a[idx] as i32 * b[idx] as i32;
            let m1 = a[idx + 1] as i32 * b[idx + 1] as i32;
            d[i] = c[i].saturating_add(m0).saturating_add(m1);
        }
        d
    }

    fn vmsumuhs(a: u16x8, b: u16x8, c: u32x4) -> u32x4 {
        let d;
        for i in 0..4 {
            let idx = 2 * i;
            let m0 = a[idx] as u32 * b[idx] as u32;
            let m1 = a[idx + 1] as u32 * b[idx + 1] as u32;
            d[i] = c[i].saturating_add(m0).saturating_add(m1);
        }
        d
    }

    ...

    fn vec_msums<T, U>(a: T, b: T, c: U) -> U
    where T: sealed::VectorMultiplySumSaturate<U> {
        a.msums(b, c)
    }

It works only with 16bit elements, signed or unsigned. In order to support that on rust we have to use some creative trait.
It is quite neat if you have to implement some filters.

vec_madds: Vector Multiply Add Saturated

    fn vec_madds(a: i16x8, b: i16x8, c: i16x8) -> i16x8 {
        let d;
        for i in 0..8 {
            let v = (a[i] as i32 * b[i] as i32) >> 15;
            d[i] = (v as i16).saturating_add(c[i]);
        }
        d
    }

Takes the high-order 17bit of the lane-wise product of the first two vectors and adds it to a third one.

Coming next

Raptor Enginering kindly gave me access to a Power 9 through their Integricloud hosting.

We could run some extensive benchmarks and we found some peculiar behaviour with the C compilers available on the machine and that got me, Luc and Alexandra a little puzzled.

Next time I’ll try to collect in a little more organic way what I randomly put on my twitter as I noticed it.

June 29, 2018
My comments on the Gentoo Github hack (June 29, 2018, 16:00 UTC)

Several news outlets are reporting on the takeover of the Gentoo GitHub organization that was announced recently. Today 28 June at approximately 20:20 UTC unknown individuals have gained control of the Github Gentoo organization, and modified the content of repositories as well as pages there. We are still working to determine the exact extent and … Continue reading "My comments on the Gentoo Github hack"

June 28, 2018

2018-07-04 14:00 UTC

We believe this incident is now resolved. Please see the incident report for details about the incident, its impact, and resolution.

2018-06-29 15:15 UTC

The community raised questions about the provenance of Gentoo packages. Gentoo development is performed on hardware run by the Gentoo Infrastructure team (not github). The Gentoo hardware was unaffected by this incident. Users using the default Gentoo mirroring infrastructure should not be affected.

If you are still concerned about provenance or are unsure what solution you are using, please consult https://wiki.gentoo.org/wiki/Project:Portage/Repository_Verification. This will instruct you on how to verify your repository.

2018-06-29 06:45 UTC

The gentoo GitHub organization remains temporarily locked down by GitHub support, pending fixes to pull-request content.

For ongoing status, please see the Gentoo infra-status incident page.

For later followup, please see the Gentoo Wiki page for GitHub 2018-06-28. An incident post-mortem will follow on the wiki.

June 14, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

Today's great news is that our manuscript "Nanomechanical characterization of the Kondo charge dynamics in a carbon nanotube" has been accepted for publication by Physical Review Letters.

The Kondo effect is a many-body phenomenon at low temperature that results from a quantum state degeneracy, as, e.g., the one of spin states in absence of a magnetic field. In its simplest case, it makes a quantum dot, in our case a carbon nanotube with some trapped electrons on it, behave very different for an even and an odd number of electrons. At an even number of trapped electrons, no current can flow through the nanotube, since temperature and applied bias voltage are too low to charge it with one more elementary charge; this phenomenon is called Coulomb blockade. Strikingly, at odd electron number, when two degenerate quantum states in the nanotube are available, Coulomb blockade seems not to matter, and a large current can flow. Theory explains this by assuming that a localized electron couples to electrons in the contacts, forming a combined, delocalized singlet quantum state.
What carries the Kondo-enhanced current, and how does the electric charge now accumulate in the carbon nanotube? We use the vibration of the macromolecule to measure this. As also in the case of, e.g., a guitar string, the resonance frequency of a nanotube changes when you pull on it; in the case of the carbon nanotube this is sensitive enough to resolve fractions of the force caused by a single elementary charge. From the vibration frequency, as function of the electrostatic potential, we calculate the average number of electrons on the nanotube, and can then compare the odd and even number cases.
A surprising result of our evaluation is that the charge trapped on the nanotube behaves the same way in the even and odd occupation case, even though the current through it is completely different. Sequential tunneling of electrons can model the charge accumulation, and with it the mechanical behaviour. The large Kondo current is carried by virtual occupation of the nanotube alone, i.e., electrons tunneling on and immediately off again so they do not contribute to the charge on it.

"Nanomechanical Characterization of the Kondo Charge Dynamics in a Carbon Nanotube"
K. J. G. Götz, D. R. Schmid, F. J. Schupp, P. L. Stiller, Ch. Strunk, and A. K. Hüttel
Physical Review Letters 120, 246802 (2018); arXiv:1802.00522 (PDF, HTML, supplementary information)

Luca Barbato a.k.a. lu_zero (homepage, bugs)
Video Compression Bounty Hunters (June 14, 2018, 18:43 UTC)

In this post, we (Luca Barbato and Luc Trudeau) joined forces to talk about the awesome work we’ve been doing on Altivec/VSX optimizations for the libvpx library, you can read it here or on Luc’s medium.

Both of us where in Brussels for FOSDEM 2018, Luca presented his work on rust-av and Luc was there to hack on rav1e – an experimental AV1 video encoder in Rust.

Luca joined the rav1e team and helped give hints about how to effectively leverage rust. Together, we worked on AV1 intra prediction code, among the other things.

Luc Trudeau: I was finishing up my work on Chroma from Luma in AV1, and wanted to stay involved in royalty free open source video codecs. When Luca talked to me about libvpx bounties on Bountysource, I was immediately intrigued.

Luca Barbato: Luc just finished implementing the Neon version of his CfL work and I wondered how that code could work using VSX. I prepared some of the machinery that was missing in libaom and Luc tried his hand on Altivec. We still had some pending libvpx work sponsored by IBM and I asked him if he wanted to join in.

What’s libvpx?

For those less familiar, libvpx is the official Google implementation of the VP9 video format. VP9 is most notably used in YouTube and Netflix. VP9 playback is available on some browsers including Chrome, Edge and Firefox and also on Android devices, covering the 75.31% of the global user base.

Ref: caniuse.com VP9 support in browsers.

Why use VP9, when the de facto video format is H.264/AVC?

Because VP9 is royalty free and the bandwidth savings are substantial when compared to H.264 when playback is available (an estimated 3.3B devices support VP9). In other words, having VP9 as a secondary codec can pay for itself in bandwidth savings by not having to send H.264 to most users.

Ref: Netflix VP9 compression analysis.

Why care about libvpx on Power?

Dynamic adaptive streaming formats like HLS and MPEG DASH have completely changed the game of streaming video over the internet. Streaming hardware and custom multimedia servers are being replaced by web servers.

From the servers’ perspective streaming video is akin to serving small videos files; lots of small video files! To cover all clients and most network conditions a considerable amount of video files must be encoded, stored and distributed.

Things are changing fast and while the total cost of ownership of video content for previous generation video formats, like H.264, was mostly made up of bandwidth and hosting, encoding costs are growing with more complex video formats like HEVC and VP9.

This complexity is reported to have grown exponentially with the upcoming AV1 video format. A video format, built on the libVPX code base, by the Alliance for Open Media, of which IBM is a founding member.

Ref: Facebook’s AV1 complexity analysis

At the same time, IBM and its partners in the OpenPower Foundation are releasing some very impressive hardware with the new Power9 processor line up. Big Iron Power9 systems, like the Talos II from Raptor Computing Systems and the collaboration between Google and Rackspace on Zaius/Barreleye servers, are ideal solutions to the tackle the growing complexity of video format encoding.

However, these awesome machines are currently at a disadvantage when encoding video. Without the platform specific optimizations, that their competitors enjoy, the Power9 architecture can’t be fully utilized. This is clearly illustrated in the x264 benchmark released in a recent Phoronix article.

Ref: Phoronix x264 server benchmark.

Thanks to the optimization bounties sponsored by IBM, we are hard at work bridging the gap in libvpx.

Optimization bounties?

Just like bug bounty programs, optimization make for great bounties. Companies that see benefit in platform specific optimizations for video codecs can sponsor our bounties on the Bountysource platform.

Multiple companies can sponsor the same bounty, thus sharing cost of more important bounties. Furthermore, bounties are a minimal risk investment for sponsors, as they are only paid out when the work is completed (and peer reviewed by libvpx maintainers)

Not only is the Bountysource platform a win for companies that directly benefit from the bounties they are sponsoring, it’s also a win for developers (like us) who can get paid to work on free and open source projects that we are passionate about. Optimization bounties are a source of sustainability in the free and open source software ecosystem.

How do you choose bounties?

Since we’re a small team of bounty hunters (Luca Barbato, Alexandra Hájková, Rafael de Lucena Valle and Luc Trudeau), we need to play it smart and maximize the impact of our work. We’ve identified two common use cases related to streaming on the Power architecture: YouTube-like encodes and real time (a.k.a. low latency) encodes.

By profiling libvpx under these conditions, we can determine the key functions to optimize. The following charts show the percentage of time spent the in top 20 functions of the libvpx encoder (without Altivec/VSX optimisations) on a Power8 system, for both YouTube-like and real time settings.

It’s interesting to see that the top 20 functions make up about 80% of the encoding time. That’s similar in spirit to the Pareto principle, in that we don’t have to optimize the whole encoder to make the Power architecture competitive for video encoding.

We see a similar distribution between YouTube-like encoding settings and real time video encoding. In other words, optimization bounties for libvpx benefit both Video on Demand (VOD) and live broadcast services.

We add bounties on the Bountysource platform around common themed functions like: convolution, sum of absolute differences (SAD), variance, etc. Companies interested in libvpx optimization can go and fund these bounties.

What’s the impact of this project so far?

So far, we delivered multiple libvpx bounties including:

  • Convolution
  • Sum of absolute differences (SAD)
  • Quantization
  • Inverse transforms
  • Intra prediction
  • etc.

To see the benefit of our work, we compiled the latest version of libVPX with and without VSX optimizations and ran it on a Power8 machine. Note that the C compiled versions can produce Altivec/VSX code via auto vectorization. The results, in frames per minutes, are shown below for both YouTube-like encoding and Real time encoding.

Our current VSX optimizations give approximately a 40% and 30% boost in encoding speed for YouTube-like and real time encoding respectively. Encoding speed increases in the range of 10 to 14 frames per minute can considerably reduce cloud encoding costs for Power architecture users.

In the context of real time encoding, the time saved by the platform optimization can be put to good use to improve compression efficiency. Concretely, a real time encoder will encode in real time speed, but speeding up the encoders allows for operators to increase the number of coding tools, resulting in better quality for the viewers and bandwidth savings for operators.

What’s next?

We’re energized by the impact that our small team of bounty hunters is having on libvpx performance for the Power architecture and we wanted to share it in this blog post. We look forward to getting even more performance from libvpx on the Power architecture. Expect considerable performance improvement for the Power architecture in the next libvpx release (1.8).

As IBM targets its Power9 line of systems at heavy cloud computations, it seems natural to also aim all that power at tackling the growing costs of AV1 encodes. This won’t happen without platform specific optimizations and the time to start is now; as the AV1 format is being finalized, everyone is still in the early phases of optimization. We are currently working with our sponsors to set up AV1 bounties, so stay tuned for an upcoming post.

May 25, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
The story of Gentoo management (May 25, 2018, 15:43 UTC)

I have recently made a tabular summary of (probably) all Council members and Trustees in the history of Gentoo. I think that this table provides a very succinct way of expressing the changes within management of Gentoo. While it can’t express the complete history of Gentoo, it can serve as a useful tool of reference.

What questions can it answer? For example, it provides an easy way to see how many terms individuals have served, or how long Trustee terms were. You can clearly see who served both on the Council and on the Board and when those two bodies had common members. Most notably, it collects a fair amount of hard-to-find data in a single table.

Can you trust it? I’ve put an effort to make the developer lists correct but given the bad quality of data (see below), I can’t guarantee complete correctness. The Trustee term dates are approximate at best, and oriented around elections rather than actual term (which is hard to find). Finally, I’ve merged a few short-time changes such as empty seats between resignation and appointing a replacement, as expressing them one by one made little sense and would cause the tables to grow even longer.

This article aims to be the text counterpart to the table. I would like to tell the history of the presented management bodies, explain the sources that I’ve used to get the data and the problems that I’ve found while working on it.

As you could suspect, the further back I had to go, the less good data I was able to find. The problems included the limited scope of our archives and some apparent secrecy of decision-making processes at the early time (judging by some cross-posts, the traffic on -core mailing list was significant, and it was not archived before late 2004). Both due to lack of data, and due to specific interest in developer self-government, this article starts in mid-2003.

Continue reading

May 10, 2018
Sebastian Pipping a.k.a. sping (homepage, bugs)
Bash: Command output to array of lines (May 10, 2018, 12:49 UTC)

We had a case at work were multi-line output of a command should be turned into an array of lines. Here's one way to do it. Two Bash features take part with this approach:

  • $'....' syntax (a dollar right in front of a single-tick literal) activates interpolation of C-like escape sequences (see below)
  • Bash variable IFS — the internal field separator affecting the way Bash applies word splitting — is temporarily changed from default spaces-tabs-and-newlines to just newlines so that we get one array entry per line

Let me demo that:

# f() { echo $'one\ntwo  spaces' ; }

# f
one
two  spaces

# IFS=$'\n' lines=( $(f) )

# echo ${#lines[@]}
2

# echo "${lines[0]}"
one

# echo "${lines[1]}"
two  spaces

April 21, 2018
Rafael G. Martins a.k.a. rafaelmartins (homepage, bugs)
Updates (April 21, 2018, 14:35 UTC)

Since I don't write anything here for almost 2 years, I think it is time for some short updates:

  • I left RedHat and moved to Berlin, Germany, in March 2017.
  • The series of posts about balde was stopped. The first post got a lot of Hacker News attention, and I will come back with it as soon as I can implement the required changes in the framework. Not going to happen very soon, though.
  • I've been spending most of my free time with flight simulation. You can expect a few related posts soon.
  • I left the Gentoo GSoC administration this year.
  • blogc is the only project that is currently getting some frequent attention from me, as I use it for most of my websites. Check it out! ;-)

That's all for now.

April 18, 2018
Zack Medico a.k.a. zmedico (homepage, bugs)

In portage-2.3.30, portage’s python API provides an asyncio event loop policy via a DefaultEventLoopPolicy class. For example, here’s a little program that uses portage’s DefaultEventLoopPolicy to do the same thing as emerge --regen, using an async_iter_completed function to implement the --jobs and --load-average options:

#!/usr/bin/env python

from __future__ import print_function

import argparse
import functools
import multiprocessing
import operator

import portage
from portage.util.futures.iter_completed import (
    async_iter_completed,
)
from portage.util.futures.unix_events import (
    DefaultEventLoopPolicy,
)


def handle_result(cpv, future):
    metadata = dict(zip(portage.auxdbkeys, future.result()))
    print(cpv)
    for k, v in sorted(metadata.items(),
        key=operator.itemgetter(0)):
        if v:
            print('\t{}: {}'.format(k, v))
    print()


def future_generator(repo_location, loop=None):

    portdb = portage.portdb

    for cp in portdb.cp_all(trees=[repo_location]):
        for cpv in portdb.cp_list(cp, mytree=repo_location):
            future = portdb.async_aux_get(
                cpv,
                portage.auxdbkeys,
                mytree=repo_location,
                loop=loop,
            )

            future.add_done_callback(
                functools.partial(handle_result, cpv))

            yield future


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--repo',
        action='store',
        default='gentoo',
    )
    parser.add_argument(
        '--jobs',
        action='store',
        type=int,
        default=multiprocessing.cpu_count(),
    )
    parser.add_argument(
        '--load-average',
        action='store',
        type=float,
        default=multiprocessing.cpu_count(),
    )
    args = parser.parse_args()

    try:
        repo_location = portage.settings.repositories.\
            get_location_for_name(args.repo)
    except KeyError:
        parser.error('unknown repo: {}\navailable repos: {}'.\
            format(args.repo, ' '.join(sorted(
            repo.name for repo in
            portage.settings.repositories))))

    policy = DefaultEventLoopPolicy()
    loop = policy.get_event_loop()

    try:
        for future_done_set in async_iter_completed(
            future_generator(repo_location, loop=loop),
            max_jobs=args.jobs,
            max_load=args.load_average,
            loop=loop):
            loop.run_until_complete(future_done_set)
    finally:
        loop.close()



if __name__ == '__main__':
    main()

April 14, 2018
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
crossdev and GNU Hurd (April 14, 2018, 00:00 UTC)

trofi's blog: crossdev and GNU Hurd

crossdev and GNU Hurd

Tl;DR: crossdev is a tool to generate a cross-compiler for you in gentoo and with some hacks (see below) you can even cross-compile to hurd!

FOSDEM 2018 conference happened recently and a lot of cool talks tool place there. The full list counts 689 events!

Hurd’s PCI arbiter was a nice one. I never actually tried hurd before and thought to give it a try in a VM.

Debian already provides full hurd installer (installation manual) and I picked it. Hurd works surprisingly well for a such an understaffed project! Installation process is very simple: it’s a typical debian CD which asks you for a few details about final system (same as for linux) and you get your OS booted.

Hurd has a ton of debian software already built and working (like 80% of the whole repo). Even GHC is ported there. While at it I grabbed all the tiny GHC patches related to hurd from Debian and pushed them upstream:

Now plain ./configure && make && make test just works.

Hurd supports only 32-bit x86 CPUs and does not support SMP (only one CPU is available). That makes building heaviweight stuff (like GHC) in a virtual machine a slow process.

To speed things up a bit I decided to build a cross-compiler from gentoo linux to hurd with the help of crossdev. What does it take to support bootstrap like that? Let’s see!

To get the idea of how to cross-compile to another OS let’s check how typical linux to linux case looks like.

Normally aiming gcc at another linux-glibc target takes the following steps:

- install cross-binutils
- install system headers (kernel headers and glibc headers):
- install minimal gcc without glibc support (not able to link final executables yet)
- install complete glibc (gcc will need crt.o files)
- install full gcc (able to link final binaries for C and C++)

In gentoo crossdev does all the above automatically by running emerge a few times for you. I wrote a more up-to-date crossdev README to describe a few details of what is happening when you run crossdev -t <target>.

hurd-glibc is not fundamentally different from linux-glibc case. Only a few packages need to change their names, namely:

- install cross-binutils
- install gnumach-headers (kernel headers part 1)
- [NEW] install cross-mig tool (Mach Interface Generator, a flavour of IDL compiler)
- install hurd-headers and glibc-headers (kernel headers part 2 and libc headers)
- install minimal gcc without libc support (not able to link final executables yet)
- install complete libc (gcc will need crt.o files)
- install full gcc (able to link final binaries for C and C++)

The only change from linux is the cross-mig tool. I’ve collected ebuilds needed in gentoo-hurd overlay.

Here is how one gets hurd cross-compiler with today’s crossdev-99999999:

git clone https://github.com/trofi/gentoo-hurd.git
HURD_OVERLAY=$(pwd)/gentoo-hurd
CROSS_OVERLAY=$(portageq get_repo_path / crossdev)
TARGET_TUPLE=i586-pc-gnu
# this will fail around glibc, it's ok we'll take on manually from there
crossdev --l 9999 -t crossdev -t ${TARGET_TUPLE}
ln -s "${HURD_OVERLAY}"/sys-kernel/gnumach ${CROSS_OVERLAY}/cross-${TARGET_TUPLE}/
ln -s "${HURD_OVERLAY}"/dev-util/mig ${CROSS_OVERLAY}/cross-${TARGET_TUPLE}/
ln -s "${HURD_OVERLAY}"/sys-kernel/hurd ${CROSS_OVERLAY}/cross-${TARGET_TUPLE}/
emerge -C cross-${TARGET_TUPLE}/linux-headers
ACCEPT_KEYWORDS='**' USE=headers-only emerge -v1 cross-${TARGET_TUPLE}/gnumach
ACCEPT_KEYWORDS='**' USE=headers-only emerge -v1 cross-${TARGET_TUPLE}/mig
ACCEPT_KEYWORDS='**' USE=headers-only emerge -v1 cross-${TARGET_TUPLE}/hurd
ACCEPT_KEYWORDS='**' USE=headers-only emerge -v1 cross-${TARGET_TUPLE}/glibc
USE="-*" emerge -v1 cross-${TARGET_TUPLE}/gcc
ACCEPT_KEYWORDS='**' USE=-headers-only emerge -v1 cross-${TARGET_TUPLE}/glibc
USE="-sanitize" emerge -v1 cross-${TARGET_TUPLE}/gcc

Done!

A few things to note here:

  • gentoo-hurd overlay is used for new gnumach, mig and hurd packages
  • symlinks to new packages are created manually (crossdev fix pending, wait on packages to get into ::gentoo)
  • uninstall linux-headers as our target is not linux (crossdev fix pending)
  • use ACCEPT_KEYWORDS=’**’ for many packages (need to decide on final keywords, maybe x86-hurd)
  • all crossdev phases are ran manually
  • only glibc git version is working as changes were merged upstream very recently.
  • USE=“sanitize” is disabled in final gcc because it’s broken for hurd

Now you can go to /usr/${TARGET_TUPLE}/etc/portage/ and tweak the defaults for ELIBC, KERNEL and other things.

Basic sanity check for a toolchain:

$ i586-pc-gnu-gcc main.c -o main
$ file main
main: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld.so, for GNU/Hurd 0.0.0, with debug_info, not stripped

Copying to the target hurd VM and runnig there also works as expected.

I use crossdev -t x86_64-HEAD-linux-gnu to have GHC built against HEAD in parallel to system’s GHC. Let’s use that for more heavyweight test to build a GHC cross-compiler to hurd:

$ EXTRA_ECONF=--with-ghc=x86_64-HEAD-linux-gnu-ghc emerge -v1 cross-i586-pc-gnu/ghc --quiet-build=n

This fails as:

rts/posix/Signals.c:398:28: error:
     note: each undeclared identifier is reported only once for each function it appears in
    |
398 |         action.sa_flags |= SA_SIGINFO;
    |                            ^

Which hints at lack of SA_SIGINFO support in upstream glibc.git. Debian as an out-of-tree tg-hurdsig-SA_SIGINFO.diff patch to provide these defines (as least it’s not our local toolchain breakage). The outcome is positive: we have got very far into cross-compiling and hit real portability issues. Woohoo!

Final words

As long as underlying toolchains are not too complicated building cross-compilers in gentoo is trivial. Next tiny step is to cross-build hurd kernel itself and run it in qemu. Ebuilds in gentoo-hurd are not yet ready for it but tweaking them should be easy.

Have fun!

Posted on April 14, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

April 11, 2018
Hanno Böck a.k.a. hanno (homepage, bugs)

SnallygasterA few days ago I figured out that several blogs operated by T-Mobile Austria had a Git repository exposed which included their wordpress configuration file. Due to the fact that a phpMyAdmin installation was also accessible this would have allowed me to change or delete their database and subsequently take over their blogs.

Git Repositories, Private Keys, Core Dumps

Last year I discovered that the German postal service exposed a database with 200.000 addresses on their webpage, because it was simply named dump.sql (which is the default filename for database exports in the documentation example of mysql). An Australian online pharmacy exposed a database under the filename xaa, which is the output of the "split" tool on Unix systems.

It also turns out that plenty of people store their private keys for TLS certificates on their servers - or their SSH keys. Crashing web applications can leave behind coredumps that may expose application memory.

For a while now I became interested in this class of surprisingly trivial vulnerabilities: People leave files accessible on their web servers that shouldn't be public. I've given talks at a couple of conferences (recordings available from Bornhack, SEC-T, Driving IT). I scanned for these issues with a python script that extended with more and more such checks.

Scan your Web Pages with snallygaster

It's taken a bit longer than intended, but I finally released it: It's called Snallygaster and is available on Github and PyPi.

Apart from many checks for secret files it also contains some checks for related issues like checking invalid src references which can lead to Domain takeover vulnerabilities, for the Optionsleed vulnerability which I discovered during this work and for a couple of other vulnerabilities I found interesting and easily testable.

Some may ask why I wrote my own tool instead of extending an existing project. I thought about it, but I didn't really find any existing free software vulnerability scanner that I found suitable. The tool that comes closest is probably Nikto, but testing it I felt it comes with a lot of checks - thus it's slow - and few results. I wanted a tool with a relatively high impact that doesn't take forever to run. Another commonly mentioned free vulnerability scanner is OpenVAS - a fork from Nessus back when that was free software - but I found that always very annoying to use and overengineered. It's not a tool you can "just run". So I ended up creating my own tool.

A Dragon Legend in US Maryland

Finally you may wonder what the name means. The Snallygaster is a dragon that according to some legends was seen in Maryland and other parts of the US. Why that name? There's no particular reason, I just searched for a suitable name, I thought a mythical creature may make a good name. So I searched Wikipedia for potential names and checked for name collisions. This one had none and also sounded funny and interesting enough.

I hope snallygaster turns out to be useful for administrators and pentesters and helps exposing this class of trivial, but often powerful, vulnerabilities. Obviously I welcome new ideas of further tests that could be added to snallygaster.

April 03, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
py3status v3.8 (April 03, 2018, 12:06 UTC)

Another long awaited release has come true thanks to our community!

The changelog is so huge that I had to open an issue and cry for help to make it happen… thanks again @lasers for stepping up once again 🙂

Highlights

  • gevent support (-g option) to switch from threads scheduling to greenlets and reduce resources consumption
  • environment variables support in i3status.conf to remove sensible information from your config
  • modules can now leverage a persistent data store
  • hundreds of improvements for various modules
  • we now have an official debian package
  • we reached 500 stars on github #vanity

Milestone 3.9

  • try to release a version faster than every 4 months (j/k) 😉

The next release will focus on bugs and modules improvements / standardization.

Thanks contributors!

This release is their work, thanks a lot guys!

  • alex o’neill
  • anubiann00b
  • cypher1
  • daniel foerster
  • daniel schaefer
  • girst
  • igor grebenkov
  • james curtis
  • lasers
  • maxim baz
  • nollain
  • raspbeguy
  • regnat
  • robert ricci
  • sébastien delafond
  • themistokle benetatos
  • tobes
  • woland

March 30, 2018
Sebastian Pipping a.k.a. sping (homepage, bugs)

I recently dockerized a small Django application. I build the Dockerfile in a way that the resulting image would allow running the container as if it was plain manage.py, e.g. that besides docker-compose up I could also do:

# For a psql session into the database:
docker-compose run <image_name> dbshell

# Or, to run the test suite:
docker-compose run <image_name> test

To make that work, I made this Docker entrypoint script:

#! /bin/bash
# Copyright (C) 2018 Sebastian Pipping <sebastian@pipping.org>
# Licensed under CC0 1.0 Public Domain Dedication.
# https://creativecommons.org/publicdomain/zero/1.0/

set -e
set -u

RUN() {
    ( PS4='# ' && set -x && "$@" )
}

RUN wait-for-it "${POSTGRES_HOST}:${POSTGRES_PORT}" -t 30

cd /app

if [[ $# -gt 0 ]]; then
    RUN ./manage.py "$@"
else
    RUN ./manage.py makemigrations
    RUN ./manage.py migrate
    RUN ./manage.py createcustomsuperuser  # self-made

    RUN ./manage.py runserver 0.0.0.0:${APP_PORT}
fi

Management command createcustomsuperuser is something simple that I built myself for this very purpose: Create a super user, support scripting, accept a passwords as bad as "password" or "demo" without complaints, and be okay if the user exists with the same credentials already (idempotency). I uploaded createcustomsuperuser.py as a Gist to GitHub as it's a few lines more. Back to the entrypoint script. For the RUN ./manage.py "$@" part to work, in the Dockerfile both ENTRYPOINT and CMD need to use the [..] syntax, e.g.:

ENTRYPOINT ["/app/docker-entrypoint.sh"]
CMD []

For more details on ENTRYPOINT quirks like that I recommend John Zaccone's well-written article "ENTRYPOINT vs CMD: Back to Basics".