Welcome to Gentoo Universe, an aggregation of weblog articles on all topics written by Gentoo developers. For a more refined aggregation of Gentoo-related topics only, you might be interested in Planet Gentoo.

Disclaimer:
Views expressed in the content published here do not necessarily represent the views of Gentoo Linux or the Gentoo Foundation.
   
February 13, 2019
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

After about 2 weeks of trying to figure out where the problem was with the amdgpu driver on my RX590 on my Ryzen mainboard on linux prOMiNd in the #radeon channel on IRC (Freenode) said I should try with the kernel commandline mem_encrypt=off .. and it fixed it! -- the Issue manifested itself that the screen on booting up got "stuck" once the KMS (kernel mode setting) tried to use amdgpu. (nomodeset did work, but left me with no X,..)

  • My Hardware:
  • AMD Ryzen 7 2700X
  • MSI X470 Gaming Plus
  • G.SKill 16GB Kit
  • Sapphire Nitro+ Radeon RX590 8GB Special Edition

I expect disabling one or both of those will do the same:

CONFIG_AMD_MEM_ENCRYPT=y
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y

here's the relevant dmesg output in case someone has a similar issue (so search engines can find it):

[   14.161225] [drm] amdgpu kernel modesetting enabled.
[   14.161259] Parsing CRAT table with 1 nodes
[   14.161262] Ignoring ACPI CRAT on non-APU system
[   14.161264] Virtual CRAT table created for CPU
[   14.161264] Parsing CRAT table with 1 nodes
[   14.161265] Creating topology SYSFS entries
[   14.161269] Topology: Add CPU node
[   14.161270] Finished initializing topology
[   14.161345] checking generic (e0000000 300000) vs hw (e0000000 10000000)
[   14.161346] fb0: switching to amdgpudrmfb from EFI VGA
[   14.161372] Console: switching to colour dummy device 80x25
[   14.161546] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE366 0xE1).
[   14.161552] [drm] register mmio base: 0xFE900000
[   14.161553] [drm] register mmio size: 262144
[   14.161558] [drm] add ip block number 0 <vi_common>
[   14.161558] [drm] add ip block number 1 <gmc_v8_0>
[   14.161559] [drm] add ip block number 2 <tonga_ih>
[   14.161559] [drm] add ip block number 3 <gfx_v8_0>
[   14.161559] [drm] add ip block number 4 <sdma_v3_0>
[   14.161560] [drm] add ip block number 5 <powerplay>
[   14.161560] [drm] add ip block number 6 <dm>
[   14.161560] [drm] add ip block number 7 <uvd_v6_0>
[   14.161561] [drm] add ip block number 8 <vce_v3_0>
[   14.161568] [drm] UVD is enabled in VM mode
[   14.161568] [drm] UVD ENC is enabled in VM mode
[   14.161569] [drm] VCE enabled in VM mode
[   14.161743] amdgpu 0000:1d:00.0: No more image in the PCI ROM
[   14.161756] ATOM BIOS: 113-4E3661U-X6I
[   14.161774] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[   14.161775] amdgpu 0000:1d:00.0: SME is active, device will require DMA bounce buffers
[   14.161775] amdgpu 0000:1d:00.0: SME is active, device will require DMA bounce buffers
[   14.311979] amdgpu 0000:1d:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[   14.311981] amdgpu 0000:1d:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[   14.311988] [drm] Detected VRAM RAM=8192M, BAR=256M
[   14.311989] [drm] RAM width 256bits GDDR5
[   14.312063] [TTM] Zone  kernel: Available graphics memory: 8185614 kiB
[   14.312064] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[   14.312064] [TTM] Initializing pool allocator
[   14.312069] [TTM] Initializing DMA pool allocator
[   14.312103] [drm] amdgpu: 8192M of VRAM memory ready
[   14.312104] [drm] amdgpu: 8192M of GTT memory ready.
[   14.312123] software IO TLB: SME is active and system is using DMA bounce buffers
[   14.312124] [drm] GART: num cpu pages 65536, num gpu pages 65536
[   14.313844] [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
[   14.313934] [drm:amdgpu_device_init.cold.34 [amdgpu]] *ERROR* sw_init of IP block <tonga_ih> failed -12
[   14.313935] amdgpu 0000:1d:00.0: amdgpu_device_ip_init failed
[   14.313937] amdgpu 0000:1d:00.0: Fatal error during GPU init
[   14.313937] [drm] amdgpu: finishing device.
[   14.314020] ------------[ cut here ]------------
[   14.314021] Memory manager not clean during takedown.
[   14.314045] WARNING: CPU: 6 PID: 4541 at drivers/gpu/drm/drm_mm.c:950 drm_mm_takedown+0x1a/0x20 [drm]
[   14.314045] Modules linked in: amdgpu(+) mfd_core snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device chash i2c_algo_bit gpu_sched drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm snd_hda_codec_realtek snd_hda_codec_generic drm snd_hda_intel snd_hda_codec agpgart snd_hwdep snd_hda_core snd_pcm nct6775 snd_timer hwmon_vid kvm snd irqbypass k10temp macvlan r8169 pcnet32 mii e1000 efivarfs dm_snapshot dm_bufio
[   14.314061] CPU: 6 PID: 4541 Comm: udevd Not tainted 4.20.2-gentooamdgpu #2
[   14.314062] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.40 06/28/2018
[   14.314070] RIP: 0010:drm_mm_takedown+0x1a/0x20 [drm]
[   14.314072] Code: 1c b1 a5 ca 66 66 2e 0f 1f 84 00 00 00 00 00 90 48 8b 47 38 48 83 c7 38 48 39 c7 75 01 c3 48 c7 c7 30 88 23 c0 e8 4d b3 a5 ca <0f> 0b c3 0f 1f 00 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53
[   14.314073] RSP: 0018:ffffaf2d839b7a08 EFLAGS: 00010286
[   14.314074] RAX: 0000000000000000 RBX: ffff95a68c102b00 RCX: ffffffff8be47158
[   14.314075] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffffa7ec6e2c
[   14.314076] RBP: ffff95a68a9229e8 R08: 000000000000003c R09: 0000000000000001
[   14.314077] R10: 0000000000000000 R11: 0000000000000001 R12: ffff95a68a9229c8
[   14.314077] R13: 0000000000000000 R14: 0000000000000170 R15: ffff95a686289930
[   14.314079] FS:  00007fe4117017c0(0000) GS:ffff95a68eb80000(0000) knlGS:0000000000000000
[   14.314080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.314081] CR2: 00007ffc0740f8e8 CR3: 000080040c5d0000 CR4: 00000000003406e0
[   14.314081] Call Trace:
[   14.314149]  amdgpu_vram_mgr_fini+0x1d/0x40 [amdgpu]
[   14.314154]  ttm_bo_clean_mm+0x9d/0xb0 [ttm]
[   14.314216]  amdgpu_ttm_fini+0x6c/0xe0 [amdgpu]
[   14.314277]  amdgpu_bo_fini+0x9/0x30 [amdgpu]
[   14.314344]  gmc_v8_0_sw_fini+0x2d/0x50 [amdgpu]
[   14.314416]  amdgpu_device_fini+0x235/0x3d6 [amdgpu]
[   14.314477]  amdgpu_driver_unload_kms+0xab/0x150 [amdgpu]
[   14.314536]  amdgpu_driver_load_kms+0x181/0x250 [amdgpu]
[   14.314543]  drm_dev_register+0x10e/0x150 [drm]
[   14.314602]  amdgpu_pci_probe+0xb8/0x120 [amdgpu]
[   14.314606]  local_pci_probe+0x3c/0x90
[   14.314609]  pci_device_probe+0xdc/0x160
[   14.314612]  really_probe+0xee/0x2a0
[   14.314613]  driver_probe_device+0x4a/0xb0
[   14.314615]  __driver_attach+0xaf/0xd0
[   14.314617]  ? driver_probe_device+0xb0/0xb0
[   14.314619]  bus_for_each_dev+0x71/0xb0
[   14.314621]  bus_add_driver+0x197/0x1e0
[   14.314623]  ? 0xffffffffc0369000
[   14.314624]  driver_register+0x66/0xb0
[   14.314626]  ? 0xffffffffc0369000
[   14.314628]  do_one_initcall+0x41/0x1b0
[   14.314631]  ? _cond_resched+0x10/0x20
[   14.314633]  ? kmem_cache_alloc_trace+0x35/0x170
[   14.314636]  do_init_module+0x55/0x1e0
[   14.314639]  load_module+0x2242/0x2480
[   14.314642]  ? __do_sys_finit_module+0xba/0xe0
[   14.314644]  __do_sys_finit_module+0xba/0xe0
[   14.314646]  do_syscall_64+0x43/0xf0
[   14.314649]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   14.314651] RIP: 0033:0x7fe411a7f669
[   14.314652] Code: 00 00 75 05 48 83 c4 18 c3 e8 b3 b7 01 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 a7 0c 00 f7 d8 64 89 01 48
[   14.314653] RSP: 002b:00007ffe7cb639e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   14.314655] RAX: ffffffffffffffda RBX: 000056165f9c3150 RCX: 00007fe411a7f669
[   14.314656] RDX: 0000000000000000 RSI: 00007fe411b6190d RDI: 0000000000000016
[   14.314656] RBP: 00007fe411b6190d R08: 0000000000000000 R09: 0000000000000002
[   14.314657] R10: 0000000000000016 R11: 0000000000000246 R12: 0000000000000000
[   14.314658] R13: 000056165f9d3270 R14: 0000000000020000 R15: 000056165f9c3150
[   14.314659] ---[ end trace 9db69ba000fb2712 ]---
[   14.314664] [TTM] Finalizing pool allocator
[   14.314666] [TTM] Finalizing DMA pool allocator
[   14.314700] [TTM] Zone  kernel: Used memory at exit: 124 kiB
[   14.314703] [TTM] Zone   dma32: Used memory at exit: 124 kiB
[   14.314704] [drm] amdgpu: ttm finalized
[   14.314868] amdgpu: probe of 0000:1d:00.0 failed with error -12

February 01, 2019
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

I tried with just exporting from postgresql like this:

COPY (SELECT ROW_TO_JSON(t)
FROM (SELECT * FROM llx_societe) t) to '/path/to/file/llx_societe_extrafields.json';

but that gives me so much that I do not need and also still keeps the half-french colum names (which as someone who doesn't speak french is driving me mad and slowing me down..)

Warning: Postgresql does not seem to escape " from HTML so you need to escape it or remove it (which is hwat i did since I do not need it)

so I'll just make query and/or view to deal with this:

SELECT  s.rowid AS s_row,
        s.nom AS s_name,
        s.phone AS s_phone,
        s.fax AS s_fax,
        s.email AS s_email,
        s.url AS s_url,
        s.fax AS s_fax,
        s.address AS s_address,
        s.town AS s_town,
        s.zip AS s_zip,
        s.note_public AS s_note_public,
        s.note_private AS s_note_private,
        s.ape AS s_fbno,
        s.idprof4 AS s_dvrno,
        s.tva_assuj AS s_UST,
        s.tva_intra AS s_uid,
        s.code_client AS s_code_client,
        s.name_alias AS s_name_alias,
        s.siren AS s_siren,
        s.siret AS s_siret,
        s.client AS s_client,
        s_dep.nom AS s_county,
        s_reg.nom AS s_country,
        s.fk_forme_juridique,
        se.pn_name AS s_pn_name,
        sp.rowid AS sp_rowid,
        sp.lastname AS sp_lastname,
        sp.firstname AS sp_firstname,
        sp.address as sp_address,
        sp.civility AS sp_civility,
        sp.address AS sp_address,
        sp.zip AS sp_zip,
        sp.town AS sp_town,
        sp_dep.nom AS sp_county,
        sp_reg.nom AS sp_country,
        sp.fk_pays AS sp_fk_pays,
        sp.birthday AS sp_birthday,
        sp.poste AS sp_poste,
        sp.phone AS sp_phone,
        sp.phone_perso AS sp_phone_perso,
        sp.phone_mobile AS sp_phone_mobile,
        sp.fax AS sp_fax,
        sp.email AS sp_email,
        sp.priv AS sp_priv,
        sp.note_private AS sp_note_private,
        sp.note_public AS sp_note_public
        

FROM llx_societe AS s
INNER JOIN llx_societe_extrafields AS se ON se.fk_object = s.rowid
LEFT JOIN llx_socpeople AS sp ON sp.fk_soc = s.rowid
LEFT JOIN llx_c_departements AS s_dep ON s.fk_departement = s_dep.rowid
LEFT JOIN llx_c_regions AS s_reg ON s_dep.fk_region = s_reg.rowid
LEFT JOIN llx_c_departements AS sp_dep ON sp.fk_departement = sp_dep.rowid
LEFT JOIN llx_c_regions AS sp_reg ON sp_dep.fk_region = sp_reg.rowid
ORDER BY s_name, sp_lastname, sp_firstname;

January 31, 2019
Michał Górny a.k.a. mgorny (homepage, bugs)

This article describes the UI deficiency of Evolution mail client that extrapolates the trust of one of OpenPGP key UIDs into the key itself, and reports it along with the (potentially untrusted) primary UID. This creates the possibility of tricking the user into trusting a phished mail via adding a forged UID to a key that has a previously trusted UID.

Continue reading

January 29, 2019
Michał Górny a.k.a. mgorny (homepage, bugs)
Identity with OpenPGP trust model (January 29, 2019, 13:50 UTC)

Let’s say you want to send a confidential message to me, and possibly receive a reply. Through employing asymmetric encryption, you can prevent a third party from reading its contents, even if it can intercept the ciphertext. Through signatures, you can verify the authenticity of the message, and therefore detect any possible tampering. But for all this to work, you need to be able to verify the authenticity of the public keys first. In other words, we need to be able to prevent the aforementioned third party — possibly capable of intercepting your communications and publishing a forged key with my credentials on it — from tricking you into using the wrong key.

This renders key authenticity the fundamental problem of asymmetric cryptography. But before we start discussing how key certification is implemented, we need to cover another fundamental issue — identity. After all, who am I — who is the person you are writing to? Are you writing to a person you’ve met? Or to a specific Gentoo developer? Author of some project? Before you can distinguish my authentic key from a forged key, you need to be able to clearly distinguish me from an impostor.

Forms of identity

Identity via e-mail address

If your primary goal is to communicate with the owner of the particular e-mail address, it seems obvious to associate the identity with the owner of the e-mail address. However, how in reality would you distinguish a ‘rightful owner’ of the e-mail address from a cracker who managed to obtain access to it, or to intercept your network communications and inject forged mails?

The truth is, the best you can certify is that the owner of a particular key is able to read and/or send mails from a particular e-mail address, at a particular point in time. Then, if you can certify the same for a long enough period of time, you may reasonably assume the address is continuously used by the same identity (which may qualify as a legitimate owner or a cracker with a lot of patience).

Of course, all this relies on your trust in mail infrastructure not being compromised.

Identity via personal data

A stronger protection against crackers may be provided by associating the identity with personal data, as confirmed by government-issued documents. In case of OpenPGP, this is just the real name; X.509 certificates also provide fields for street address, phone number, etc.

The use of real names seems to be based on two assumptions: that your real name is reasonable well-known (e.g. it can be established with little risk of being replaced by a third party), and that the attacker does not wish to disclose his own name. Besides that, using real names meets with some additional criticism.

Firstly, requiring one to use his real name may be considered an invasion on privacy. Most notably, some people wish not to disclose or use their real names, and this effectively prevents them from ever being certified.

Secondly, real names are not unique. After all, the naming systems developed from the necessity of distinguishing individuals in comparatively small groups, and they simply don’t scale to the size of the Internet. Therefore, name collisions are entirely possible and we are relying on sheer luck that the attacker wouldn’t happen to have the same name as you do.

Thirdly and most importantly, verifying identity documents is non-trivial and untrained individuals are likely to fall victim of mediocre quality fakes. After all, we’re talking about people who hopefully read some article on verifying a particular kind of document but have no experience recognizing forgery, no specialized hardware (I suppose most of you don’t carry a magnifying glass and a UV light on yourself) and who may lack skills in comparing signatures or photographs (not to mention some people have really old photographs in documents). Some countries don’t even issue any official documentation for document verification in English!

Finally, even besides the point of forged documents, this relies on trust in administration.

Identity via photographs

This one I’m mentioning merely for completeness. OpenPGP keys allow adding a photo as one of your UIDs. However, this is rather rarely used (out of the keys my GnuPG fetched so far, less than 10% have photographs). The concerns are similar as for personal data: it assumes that others are reliably able to know how you look like, and that they are capable of reliably comparing faces.

Online identity

An interesting concept is to use your public online activity to prove your identity — such as websites or social media. This is generally based on cross-referencing multiple resources with cryptographically proven publishing access, and assuming that an attacker would not be able to compromise all of them simultaneously.

A form of this concept is utilized by keybase.io. This service builds trust in user profiles via cryptographically cross-linking your profiles on some external sites and/or your websites. Furthermore, it actively encourages other users to verify those external proofs as well.

This identity model entirely relies on trust in network infrastructure and external sites. The likeliness of it being compromised is reduced by (potentially) relying on multiple independent sites.

Web of Trust model

Most of time, you won’t be able to directly verify the identity of everyone you’d like to communicate with. This creates a necessity of obtaining indirect proof of authenticity, and the model normally used for that purpose in OpenPGP is the Web of Trust. I won’t be getting into the fine details — you can find them e.g. in the GNU Privacy Handbook. For our purposes, it suffices to say that in WoT the authenticity of keys you haven’t verified may be assessed by people whose keys you trust already, or people they know, with a limited level of recursion.

The more key holders you can trust, the more keys you can have verified indirectly and the more likely it is that your future recipient will be in that group. Or that you will be able to get someone from across the world into your WoT by meeting someone residing much closer to yourself. Therefore, you’d naturally want the WoT to grow fast and include more individuals. You’d want to preach OpenPGP onto non-crypto-aware people. However, this comes with inherent danger: can you really trust that they will properly verify the identity of the keys they sign?

I believe this is the most fundamental issue with WoT model: for it to work outside of small specialized circles, it has to include more and more individuals across the world. But this growth inevitable makes it easier for a malicious third party to find people that can be tricked into certifying keys with forged identities.

Conclusion

The fundamental problem in OpenPGP usage is finding the correct key and verifying its authenticity. This becomes especially complex given there is no single clear way of determining one’s identity in the Internet. Normally, OpenPGP uses a combination of real name and e-mail address, optionally combined with a photograph. However, all of them have their weaknesses.

Direct identity verification for all recipients is non-practical, and therefore requires indirect certification solutions. While the WoT model used by OpenPGP attempts to avoid centralized trust specific to PKI, it is not clear whether it’s practically manageable. On one hand, it requires trusting more people in order to improve coverage; on the other, it makes it more vulnerable to fraud.

Given all the above, the trust-via-online-presence concept may be of some interest. Most importantly, it establishes a closer relationship between the identity you actually need and the identity you verify — e.g. you want to mail the person being an open source developer, author of some specific projects rather than arbitrary person with a common enough name. However, this concept is not established broadly yet.

January 26, 2019
Michał Górny a.k.a. mgorny (homepage, bugs)

This article shortly explains the historical git weakness regarding handling commits with multiple OpenPGP signatures in git older than v2.20. The method of creating such commits is presented, and the results of using them are described and analyzed.

Continue reading

January 20, 2019
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
py3status v3.16 (January 20, 2019, 21:10 UTC)

Two py3status versions in less than a month? That’s the holidays effect but not only!

Our community has been busy discussing our way forward to 4.0 (see below) and organization so it was time I wrote a bit about that.

Community

A new collaborator

First of all we have the great pleasure and honor to welcome Maxim Baz @maximbaz as a new collaborator on the project!

His engagement, numerous contributions and insightful reviews to py3status has made him a well known community member, not to mention his IRC support 🙂

Once again, thank you for being there Maxim!

Zen of py3status

As a result of an interesting discussion, we worked on defining better how to contribute to py3status as well as a set of guidelines we agree on to get the project moving on smoothly.

Here is born the zen of py3status which extends the philosophy from the user point of view to the contributor point of view!

This allowed us to handle the numerous open pull requests and get their number down to 5 at the time of writing this post!

Even our dear @lasers don’t have any open PR anymore 🙂

3.15 + 3.16 versions

Our magic @lasers has worked a lot on general modules options as well as adding support for i3-gaps added features such as border coloring and fine tuning.

Also interesting is the work of Thiago Kenji Okada @m45t3r around NixOS packaging of py3status. Thanks a lot for this work and for sharing Thiago!

I also liked the question of Andreas Lundblad @aioobe asking if we could have a feature allowing to display a custom graphical output, such as a small PNG or anything upon clicking on the i3bar, you might be interested in following up the i3 issue he opened.

Make sure to read the amazing changelog for details, a lot of modules have been enhanced!

Highlights

  • You can now set a background, border colors and their urgent counterparts on a global scale or per module
  • CI now checks for black format on modules, so now all the code base obey the black format style!
  • All HTTP requests based modules now have a standard way to define HTTP timeout as well as the same 10 seconds default timeout
  • py3-cmd now allows sending click events with modifiers
  • The py3status -n / –interval command line argument has been removed as it was obsolete. We will ignore it if you have set it up, but better remove it to be clean
  • You can specify your own i3status binary path using the new -u, –i3status command line argument thanks to @Dettorer and @lasers
  • Since Yahoo! decided to retire its public & free weather API, the weather_yahoo module has been removed

New modules

  • new conky module: display conky system monitoring (#1664), by lasers
  • new module emerge_status: display information about running gentoo emerge (#1275), by AnwariasEu
  • new module hueshift: change your screen color temperature (#1142), by lasers
  • new module mega_sync: to check for MEGA service synchronization (#1458), by Maxim Baz
  • new module speedtest: to check your internet bandwidth (#1435), by cyrinux
  • new module usbguard: control usbguard from your bar (#1376), by cyrinux
  • new module velib_metropole: display velib metropole stations and (e)bikes (#1515), by cyrinux

A word on 4.0

Do you wonder what’s gonna be in the 4.0 release?
Do you have ideas that you’d like to share?
Do you have dreams that you’d love to become true?

Then make sure to read and participate in the open RFC on 4.0 version!

Development has not started yet; we really want to hear from you.

Thank you contributors!

There would be no py3status release without our amazing contributors, so thank you guys!

  • AnwariasEu
  • cyrinux
  • Dettorer
  • ecks
  • flyingapfopenguin
  • girst
  • Jack Doan
  • justin j lin
  • Keith Hughitt
  • L0ric0
  • lasers
  • Maxim Baz
  • oceyral
  • Simon Legner
  • sridhars
  • Thiago Kenji Okada
  • Thomas F. Duellmann
  • Till Backhaus

January 13, 2019
Alice Ferrazzi a.k.a. alicef (homepage, bugs)
Getting control on my data (January 13, 2019, 19:28 UTC)

Since 2009 I started to get always more interested on privacy,
radical decentralization and self-hosting.
But only recently I started to actively work on keeping
my own privacy and making more strict my open source usage
(no dropbox, no google services).
The point of this more radical change is not only privacy.
The point is partially because I don't want that corporation1 use
my data for their business and partially because I think
that open source and decentralization is the way to go.
Not using open source is giving corporations the ability to automate us27.
Continuing to use a central controlled entity is giving away our
freedom and our privacy (yes, also in our life).
Corporation that have many users and that are dominating our communication services
they can control every aspect of our life.
They can remove and censor whatever content is against their view,
adding crazily expensive service features and owning your data.
I prefer to use good open source, taking control back over my data
and be sure that they are contributing back to the ecosystem.
Taking control back over my freedom and having the possibility to
contribute back and helping out.
I prefer to donate to a service that is giving users freedom than
giving money to a service that is removing user rights.

dontspyonme image from: https://www.reddit.com/r/degoogle/comments/8bzumr/dont_spy_on_me/

Unfortunatly, also, my server hosting my irc znc bouncer and my previous
website started to get too full for what I wanted to do,
so I had to get a new VPS for hosting my services and I'm
using the old one for keeping just my email server.
Before moving out I also had a Google account that was already asking money for
keeping my google email account space (I would have to pay google for doing
data analysis on my email...).

So I decided to quit.
Quitting facebook5, google6 and dropbox18.
Probably also quitting twitter in the near future.

I started setting my servers but I wanted something simple to setup
and that I could easily move away, if I have any kind of problem
(like moving to a different server or just keeping simple data backup).

As now I'm heavily relying on docker.
I changed my google email to mailcow17, and having control on the own mail service is
a really good experience.
Mailcow is using only open source, like SoGo that is also really easy to use,
and offer the possibility to make mail filters similar to google mail.
The move to mailcow was straightforward but I still need to finish to move
all my mail to the new accout.
Moved away from Google drive and dropbox for nextcloud + collabora online (stable libreoffice online)7.
Installed back znc and quassel core for my irc bouncer.
I used grammar for sometime in the browser and now I'm using language-tool 9
with own docker server.
I stopped searching videos on youtube and just using peertube10.
I'm still unfortunatly using twitter but I opened a account on mastodon11 (fosstodon),
I could talk with the owner and looks a reasonable setup.
Google search became searx12 + yacy13.
Android became lineage os19 + fdroid20 + aurora store28 (unfortuantly not all the application that I need are Open Source). Also my password as been moved away from lastpass to bitwarden21
keepassxc22 and pass23.
The feeling got by selfhosting most service that I use,
is definetly, as Douglas Rushkoff (team human24) would say, more human.
Is less internet of the corporations and feels more like what internet need to be,
something that is managed and owned by human not by algorithms trying to
use your data for making growth.

privacytools Nice inspiration for quitting was privacytools.io 1

Read also:
Nothing to hide argument (Wikipedia)2
How do you counter the "I have nothing to hide?" argument? (reddit.com)3
'I've Got Nothing to Hide' and Other Misunderstandings of Privacy (Daniel J. Solove - San Diego Law Review)4
Richard Stallman on privacy16

I also moved from ikiwiki to pelican, but this was more a personal preference, ikiwiki is really good but pelican25 is more simple for me to customize as is made in python.
I also went back to Qtile26 from i3.

so enjoy my new blogs and my new rants :)

January 12, 2019
Alice Ferrazzi a.k.a. alicef (homepage, bugs)
My First Review (January 12, 2019, 18:06 UTC)

This was a test post but I think it can became a real post.
The test site post was this:
"Following is a review of my favorite mechanical keyboard."

I actually recently bought a new keyboard.
Usually I use a Happy Hacking Keyboard professional 2 english layout, I have two of them.
HHB_work HHKB_home

I really like my HHKB and I have no drow back on using it.
Both keyboard are modded with the hasu TMK controller.
The firmware is formatted with a colemak layout.

But recently I see the advertisment of the ulitmate hacking
keyboard.
Is interesting that is made by a crowd founding and that looks
heavily customizable.
Was looking pretty solid, so I bought one.
Here it is:
HHKB_work

Having the key mark on the key, as a colemak user, was
enough useless.
But I had no problem remapping the firmware for following
the colemak layout.

January 09, 2019
FOSDEM 2019 (January 09, 2019, 00:00 UTC)

FOSDEM logo

It’s FOSDEM time again! Join us at Université libre de Bruxelles, Campus du Solbosch, in Brussels, Belgium. This year’s FOSDEM 2019 will be held on February 2nd and 3rd.

Our developers will be happy to greet all open source enthusiasts at our Gentoo stand in building K. Visit this year’s wiki page to see who’s coming. So far eight developers have specified their attendance, with most likely many more on the way!

January 05, 2019
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

Going to be using this blog post to add bits and pieces of how to use proteus to handle data in tryton.

Just noticed the proteus readme is quite good: here's a link to the proteus github

IMPORTANT: One thing I noticed (the hard way) is that if you are connected with a proteus session and you add&activate a module (at least when it is not done using proteus) you need to re-connect as it does not seem to add things like extra fields added to models otherwise.

First thing: connect:

from proteus import config, Model, Wizard, Report
pcfg = config.set_trytond(database='trytond', config_file='/etc/tryton/trytond.conf')

Then we just get ourselved our parties:

Party = Model.get('party.party')
all_parties=Party.find()
for p in all_parties:
    print(p.name)
    print(p.addresses[0].full_address
    )

This will print out all names and the first full address of each.

Party Relations (a seperate module):

p.relations

Would give you output similar to this (if there are relations - in my case 2):

[proteus.Model.get('party.relation.all')(2),
 proteus.Model.get('party.relation.all')(4)]

Interesting fields there (for me):

p.relations[0].type.name # returns the name of the relation as entered
p.relations[0].reverse # reverse relation as entered
# the next 2 are self explainatory anyway just note the '_' with from
p.relations[0].to
p.relations[0].from_

Now to add a new one:

np = Party()
np.name='Test Customer from Proteus'
np.save()

This just creates a new party with just a name. default values that are set up (like default language) are set. Until it is saved the id (np.id) is -1. By default it also comes with one (empty address).

Here's how to edit/add:

np.addresses[0].zip='1234'
np.addresses.new(zip='2345')
np.save() # don't forget this

Extra fields from other (possibly own) can be accessed exactly the same way as the normal ones (just don't forget to reconnect - like i did ;) )

Here's how you refresh the data:

np.reload()

d

January 04, 2019
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
page fault handling on alpha (January 04, 2019, 00:00 UTC)

trofi's blog: page fault handling on alpha

page fault handling on alpha

Bug

This was a quiet evening on #gentoo-alpha. Matt Turner shared an unusual kernel crash report seen by Dmitry V. Levin.

Dmitry noticed that one of AlphaServer ES40 machines does not handle strace test suite and generates kernel oopses:

Unable to handle kernel paging request at virtual address ffffffffffff9468
CPU 3 
aio(26027): Oops 0
pc = [<fffffc00004eddf8>]  ra = [<fffffc00004edd5c>]  ps = 0000    Not tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fffffc00c58e6300  t0 = fffffffffffffff2  t1 = 000002000025e000
t2 = fffffc01f159fef8  t3 = fffffc0001009640  t4 = fffffc0000e0f6e0
t5 = 0000020001002e9e  t6 = 4c41564e49452031  t7 = fffffc01f159c000
s0 = 0000000000000002  s1 = 000002000025e000  s2 = 0000000000000000
s3 = 0000000000000000  s4 = 0000000000000000  s5 = fffffffffffffff2
s6 = fffffc00c58e6300
a0 = fffffc00c58e6300  a1 = 0000000000000000  a2 = 000002000025e000
a3 = 00000200001ac260  a4 = 00000200001ac1e8  a5 = 0000000000000001
t8 = 0000000000000008  t9 = 000000011f8bce30  t10= 00000200001ac440
t11= 0000000000000000  pv = fffffc00006fd320  at = 0000000000000000
gp = 0000000000000000  sp = 00000000265fd174
Disabling lock debugging due to kernel taint
Trace:
[<fffffc0000311404>] entSys+0xa4/0xc0

Oopses should never happen against userland workloads.

Here crash happened right in the io_submit() syscall. “Should be a very simple arch-specific bug. Can’t take much time to fix.” was my thought. Haha.

Reproducer

Dmitry provided very nice reproducer of the problem (extracted from strace test suite):

The idea of this test is simple: create valid context for asynchronous IO and pass invalid pointer ptr to it. mmap()/munmap() trick makes sure the ptr is pointing at an invalid non-NULL user memory location.

To reproduce and explore the bug locally I picked qemu alpha system emulation. To avoid complexity of searching for proper IDE driver for root filesystem I built minimal linux kernel with only initramfs support without filesystem or block device support.

Then I put statically linked reproducer and busybox into initramfs:

$ LANG=C tree root/
root/
|-- aio (statically linked aio.c)
|-- aio.c (source above)
|-- bin
|   |-- busybox (statically linked busybox)
|   `-- sh -> busybox
|-- dev (empty dir)
|-- init (script below)
|-- proc (empty dir)
`-- sys (empty dir)

4 directories, 5 files

$ cat root/init
#!/bin/sh

mount -t proc none /proc
mount -t sysfs none /sys
exec bin/sh

To run qemu system emulation against the above I used the following one-liner:

run-qemu.sh builds initramfs image and runs kernel against it.

Cross-compiling vmlinux on alpha is also straightforward:

I built kernel and started a VM as:

# build kernel
$ ./mk.sh -j$(nproc)

# run kernel
$ ./run-qemu.sh -curses
...
[    0.650390] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
/ #

That was simple. I got the prompt! Then I ran statically linked /aio reproducer as:

/ # /aio
Unable to handle kernel paging request at virtual address 0000000000000000
aio(26027): Oops -1
...

Woohoo! Crashed \o/ This allowed me to explore failure in more detail.

I used -curses (instead of default -sdl) to ease copying of text back from VM.

Fault address pattern was slightly different from the original report. I hoped it’s a manifestation of the same bug. Worst case I would find another bug to fix and get back to original one again :)

Into the rabbit hole

Oops was happening every time I ran /aio on 4.20 kernel. io_submit(2) man page claims it’s an old system call from 2.5 kernel era. Thus it should not be a recent addition.

How about older kernels? Did they also fail?

I was still not sure I had correct qemu/kernel setup. I decided to pick older 4.14 kernel version known to run without major problems on our alpha box. 4.14 kernel version did not crash in qemu either. This reassured me I have not completely broken setup.

I got first suspect: kernel regression.

Reproducer was very stable. Kernel bisection got me to first regressed commit:

commit 95af8496ac48263badf5b8dde5e06ef35aaace2b
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat May 26 19:43:16 2018 -0400

    aio: shift copyin of iocb into io_submit_one()

    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

:040000 040000 20dd44ac4706540b1c1d4085e4269bd8590f4e80 05d477161223e5062f2f781b462e0222c733fe3d M      fs

The commit clearly touched io_submit() syscall handling. But there is a problem: the change was not alpha-specific at all. If commit had any problems it also should have caused problems on other systems.

To get better understanding of probable cause I decided to look at failure mechanics. Actual values of local variables in io_submit() right before crash might get me somewhere. I started adding printk() statements around SYSCALL_DEFINE3(io_submit, …) implementation.

At some point after enough printk() calls added crashes disappeared. This confirmed it was not just a logical bug but something more subtle.

I also was not able to analyze the generated code difference between printk()/no-printk() versions.

Then I attempted to isolate faulty code into a separate function but not much success here either. Any attempt to factor out a subset of io_submit() into a separate function made bug to go away.

It was time for a next hypothesis: mysterious incorrect compiler code generation or invalid __asm__ constraints for some kernel macro exposed after minor code motion.

Single stepping through kernel

How to get an insight into the details without affecting original code too much?

Having failed at minimal code snippet I attempted to catch exact place of page fault by single-stepping through kerenel using gdb.

For qemu-loadable kernels the procedure very straightforward:

  • start gdb server on qemu side with -s option
  • start gdb client on host side with target remote localhost:1234

The same procedure in exact commands (I’m hooking into sys_io_submit()):

<at tty1>
$ ./run-qemu.sh -s

<at tty2>
$ gdb --quiet vmlinux
(gdb) target remote localhost:1234
  Remote debugging using localhost:1234
  0xfffffc0000000180 in ?? ()
(gdb) break sys_io_submit 
  Breakpoint 1 at 0xfffffc000117f890: file ../linux-2.6/fs/aio.c, line 1890.
(gdb) continue
  Continuing.

<at qemu>
  # /aio

<at tty2 again>
  Breakpoint 1, 0xfffffc000117f89c in sys_io_submit ()
(gdb) bt
  Breakpoint 1, __se_sys_io_submit (ctx_id=2199023255552, nr=1, iocbpp=2199023271936) at ../linux-2.6/fs/aio.c:1890
  1890    SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
(gdb) bt
  #0  __se_sys_io_submit (ctx_id=2199023255552, nr=1, iocbpp=2199023271936) at ../linux-2.6/fs/aio.c:1890
  #1  0xfffffc0001011254 in entSys () at ../linux-2.6/arch/alpha/kernel/entry.S:476

Now we can single-step through every instruction with nexti and check where things go wrong.

To poke around efficiently I kept looking at these cheat sheets:

Register names are especially useful as each alpha register has two names: numeric and mnemonic. Source code might use one form and gdb disassembly might use another. For example $16/a0 for gas ($r16/$a0 for gdb) is a register to pass first integer argument to function.

After many backs and forths I found the suspicious behaviour when handling single instruction:

(gdb) disassemble
  => 0xfffffc000117f968 <+216>:   ldq     a1,0(t1)
     0xfffffc000117f96c <+220>:   bne     t0,0xfffffc000117f9c0 <__se_sys_io_submit+304>
(gdb) p $gp
    $1 = (void *) 0xfffffc0001c70908 # GOT
(gdb) p $a1
    $2 = 0
(gdb) p $t0
    $3 = 0
(gdb) nexti
     0xfffffc000117f968 <+216>:   ldq     a1,0(t1)
  => 0xfffffc000117f96c <+220>:   bne     t0,0xfffffc000117f9c0 <__se_sys_io_submit+304>
(gdb) p $gp
    $4 = (void *) 0x0
(gdb) p $a1
    $5 = 0
(gdb) p $t0
   $6 = -14 # -EFAULT

The above gdb session executes single ldq a1,0(t1) instruction and observes effect on the registers gp, a1, t0.

Normally ldq a1, 0(t1) would read 64-bit value pointed by t1 into a1 register and leave t0 and gp untouched.

The main effect seen here that causes later OOps is sudden gp change. gp is supposed to point to GOT (global offset table) table in current “program” (kernel in this case). Something managed to corrupt it.

By /aio test case construction instruction ldq a1,0(t1) is not supposed to read any valid data: our test case passes invalid memory location there. All the register changing effects are the result of page fault handling.

The smoking gun

Grepping around arch/alpha directory I noticed entMM page fault handling entry.

It claims to handle page faults and keeps gp value on stack. Let’s trace the fate of that on-stack value as page fault happens:

(gdb) disassemble
  => 0xfffffc000117f968 <+216>:   ldq     a1,0(t1)
     0xfffffc000117f96c <+220>:   bne     t0,0xfffffc000117f9c0 <__se_sys_io_submit+304>
(gdb) p $gp
    $1 = (void *) 0xfffffc0001c70908 # GOT

(gdb) break entMM
    Breakpoint 2 at 0xfffffc0001010e10: file ../linux-2.6/arch/alpha/kernel/entry.S, line 200
(gdb) continue
    Breakpoint 2, entMM () at ../linux-2.6/arch/alpha/kernel/entry.S:200
(gdb) x/8a $sp
    0xfffffc003f51be78:     0x0     0xfffffc000117f968 <__se_sys_io_submit+216>
    0xfffffc003f51be88:     0xfffffc0001c70908 <# GOT> 0xfffffc003f4f2040
    0xfffffc003f51be98:     0x0     0x20000004000 <# userland address>
    0xfffffc003f51bea8:     0xfffffc0001011254 <entSys+164> 0x120001090
(gdb) watch -l *0xfffffc003f51be88
    Hardware watchpoint 3: -location *0xfffffc003f51be88
(gdb) continue
    Old value = 29821192
    New value = 0
    0xfffffc00010319d0 in do_page_fault (address=2199023271936, mmcsr=<optimized out>, cause=0, regs=0xfffffc003f51bdc0)
       at ../linux-2.6/arch/alpha/mm/fault.c:199
    199                     newpc = fixup_exception(dpf_reg, fixup, regs->pc);

Above gdb session does the following:

  • break entMM: break at page fault
  • x/8a $sp: print 8 top stack values at entMM call time
  • spot gp value at 0xfffffc003f51be88 (sp+16) address
  • watch -l *0xfffffc003f51be88: set hardware watchpoint at a memory location where gp is stored.

Watch triggers at seemingly relevant place: fixup_exception() where exception handler adjusts registers before resuming the faulted task.

Looking around I found an off-by-two bug in page fault handling code. The fix was simple:

Patch is proposed upstream as https://lkml.org/lkml/2018/12/31/83.

Effect of the patch is to write 0 into on-stack location of a1 ($17 register) instead of location of gp.

That’s it!

Page fault handling magic

I always wondered how kernel reads data from userspace when it’s needed. How does it do swap-in if data is not available? How does it check for permission privilege access? That kind of stuff.

The above investigation covers most of involved components:

  • ldq instruction is used to force the read from userspace (as one would read from kernel’s memory)
  • entMM/do_page_fault() handles the userspace fault as if fault would not happen

The few minor missing details are:

  • How does kernel know which instructions are known to generate user page faults?
  • What piece of hardware holds a pointer to page fault handler on alpha?

Let’s expand the code involved in page fault handling. Call site:

which is translated to already familiar pair of instructions:

=> 0xfffffc000117f968 <+216>:   ldq     a1,0(t1)
   0xfffffc000117f96c <+220>:   bne     t0,0xfffffc000117f9c0 <__se_sys_io_submit+304>

Fun fact: get_user() has two return values: normal function return value (stored into t0 register) and user_iocb effect (stored into a1 register).

Let’s expand get_user() implementation on alpha:

A lot of simple code above does two things:

  1. use __access_ok() to check for address to be a userspace address to prevent data exfiltration from kernel.
  2. dispatch across different supported sizes to do the rest of work. Our case is a simple 64-bit read.

Looking at __get_user_64() in more detail:

A few observations:

  • The actual check for address validity is done by CPU: load-8-bytes instruction (ldq %0,%2) is executed and MMU handles a page fault
  • There is no explicit code to recover from the exception. All auxiliary information it put into __ex_table section.
  • ldq %0,%2 instruction uses only parameters “0” (__gu_val) and “2”(addr) but does not use “1”(__gu_err) parameter directly.
  • __ex_table uses cool lda instruction hack to encode auxiliary data:
    • __gu_err error register
    • pointer to next instruction after faulty instrustion: cont-label (or 2b-1b)
    • result register

Page fault handling mechanism knows how to get to __ex_table data where “1”(__gu_err) is encoded and is able to reach that data to use it later in mysterious fixup_exception() we saw before.

In case of alpha (and many other targets) __ex_table collection is defined by arch/alpha/kernel/vmlinux.lds.S linker script using EXCEPTION_TABLE() macro:

#define EXCEPTION_TABLE(align)                         \
    . = ALIGN(align);                                  \
    __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {  \
        __start___ex_table = .;                        \
        KEEP(*(__ex_table))                            \
        __stop___ex_table = .;                         \
    }
//...

Here all __ex_table sections are gathered between __start___ex_table and __stop___ex_table symbols. Those are handled by generic kernel/extable.c code:

search_exception_tables() resolves faut address to relevant struct exception_table_entry.

Let’s look at the definition of struct exception_table_entry:

Note how lda in-memory instruction format is used to encode all details needed by fixup_exception()! In case of our sys_io_submit() case it would be lda a1, 4(t0) (lda r17, 4(r1)):

(gdb) bt
  #0  0xfffffc00010319d0 in do_page_fault (address=2199023271936, mmcsr=<optimized out>, cause=0, 
      regs=0xfffffc003f51bdc0) at ../linux-2.6/arch/alpha/mm/fault.c:199
  #1  0xfffffc0001010eac in entMM () at ../linux-2.6/arch/alpha/kernel/entry.S:222
(gdb) p *fixup
    $4 = {insn = -2584576, fixup = {unit = 572588036, bits = {nextinsn = 4, errreg = 1, valreg = 17}}}

Note how page fault handling also advances pc (program counter or instruction pointer) nextinsn=4 bytes forward to skip failed ldq instruction.

arch/alpha/mm/fault.c does all the heavy-lifting of handling page faults. Here is a small snippet that handles our case of faults covered by exception handling:

do_page_fault() also does a few other page-fault related things I carefully skipped here:

  • page fault accounting
  • handling of missing support for “prefetch” instruction
  • stack growth
  • OOM handling
  • SIGSEGV, SIGBUS propagation

Once do_page_fault() gets control it updates regs struct in memory for faulted task using dpf_reg() macro. It looks unusual:

  • refers to negative offsets sometimes: (r) <= 15 ? (r)-16 (out of struct pt_regs)
  • defines not one but a few ranges of registers: 0-8, 9-15, 16-18, 19-…

struct pt_regs as is:

Now meaning of dpf_reg() should be more clear. As pt_regs keeps only a subset of registers is has to account for gaps and offsets.

Here I noticed the bug: r16-r18 range is handled incorrectly by dpf_reg(): r16 “address” is regs+10 (26-16), not regs+8.

The implementation also means that dpf_reg() can’t handle gp(r29) and sp(r30) registers as value registers. That should not normally be a problem as gcc never assigns those registers for temporary computations and keeps them to hold GOT pointer and stack pointer at all times. But one could write assembly code to do it :)

If all the above makes no sense to you it’s ok. Check kernel documentation for x86 exception handling instead which uses very similar technique.

To be able to handle all registers we need to bring in r9-r15. Those are written right before struct pt_regs right at entMM entry:

Here there are a few subtle things going on:

  1. at entry entMM already has a frame of last 6 values: ps,pc,gp,r16-r18.
  2. then SAVE_ALL (not pasted bove) stores r0-r8,r19-r28,hae,trap_a0-trap-a2
  3. and only then r9-r15 are stored (note the subq $sp, 56, $sp to place them before).

In C land only 2. and 3. constitute struct pt_regs. 1. happens to be outside and needs negative addressing we saw in dpf_reg().

As I understand the original idea was to share ret_from_sys_call part across various kernel entry points:

  • system calls: entSys
  • arithmetic exceptions: entArith
  • external interrupts: entInt
  • internal faults (bad opcode, FPU failures, breakpoint traps, ): entIF
  • page faults: entMM
  • handling of unaligned access: entUna
  • MILO debug break: entDbg

Of the above only page faults and unaligned faults need read/write acceess to every register.

In practice entUna uses different layout and simpler code patching.

The last step to get entMM executed at a fault handler is to register it in alpha’s PALcode subsystem (Privileged Architecture Library code).

It’s done in trap_init(). along with other handlers. Simple!

Or not so simple. What is that PALcode thing (wiki’s link)? It looks like a tiny hypervisor that provides service points for CPU you can access with call_pal <number> instruction.

It puzzled me a lot of what call_pal was supposed to do. Should it transfer control somwehre else or is it a normal call?

Actually given it’s a generic mechanism to do “privileged service calls” it can do both. I was not able to quickly find the details on how different service calls affect registers and found it simplest to navigate through qemu’s PAL source.

AFAIU PALcode of real alpha machine is a proprietary process-specific blob that could have it’s own quirks.

Back to out qemu-palcode let’s looks at a few examples.

First is function-like call_pal PAL_swpipl used in entMM and others:

I know almost nothing about PAL but I suspect mfpr means move-from-physical-register. hw_rei/hw_ret is a branch from PAL service routine back to “unprivileged” user/kernel.

hw_rei does normal return from call_pal to the instruction next to call_pal.

Here call_pal PAL_rti is an example of task-switch-like routine:

Here target (p5, some service only hardware register) was passed on stack in FRM_Q_PC($sp).

That PAL_rti managed to confused me a lot as I was trying to single-step through it as a normal function. I did not notice how I was jumping from page fault handling code to timer interrupt handling code.

But all became clear once I found it’s definition.

Parting words

  • qemu can emulate alpha good enough to debug obscure kernel bugs
  • gdb server is very powerful for debugging unmodified kernel code including hardware watchpoints, dumping registers, watching after interrupt handling routines
  • My initial guesses were all incorrect: it was not a kernel regression, not a compiler deficiency and not an __asm__ constraint annotation bug.
  • PALcode while a nice way to abstract low-level details of CPU implementation complicates debugging of operating system. PALcode also happens to be OS-dependent!
  • This was another one-liner fix :)
  • The bug has been always present in kernel (for about 20 years?).

Have fun!

Posted on January 4, 2019
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

December 30, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)

Since there are plenty of blogposts about what people would like to have or will implement in rust in 2019 here is mine.

I spent the last few weeks of my spare time making a C-api for rav1e called crav1e, overall the experience had been a mixed bag and there is large space for improvement.

Ideally I’d like to have by the end of the year something along the lines of:

$ cargo install-library --prefix=/usr --libdir=/usr/lib64 --destdir=/staging/place

So that it would:
– build a valid cdylib+staticlib
– produce a correct header
– produce a correct pkg-config file
– install all of it in the right paths

All of this requiring a quite basic build.rs and, probably, an applet.

What is it all about?

Building and installing properly shared libraries is quite hard, even more on multiple platforms.

Right now cargo has quite limited install capabilities with some work pending on extending it and has an open issue and a patch.

Distributions that, probably way too early since the rust-ABI is not stable nor being stabilized yet, are experimenting in building everything as shared library also have those problems.

Why it is important

rust is a pretty good language and has a fairly simple way to interact in both direction with any other language that can produce or consume C-ABI-compatible object code.

This is already quite useful if you want to build a small static archive and link it in your larger application and/or library.

An example of this use-case is librsvg.

Such heterogeneous environment warrant for a modicum of additional machinery and complication.

But if your whole library is written in rust, it is a fairly annoying amount of boilerplate that you would rather avoid.

Current status

If you want to provide C-bindings to your crates you do not have a single perfect solution right now.

What works well already

Currently building the library itself works fine and it is really straightforward:

  • It is quite easy to mark data types and functions to be C-compatible:
#[repr(C)]
pub struct Foo {
    a: Bar,
    ...
}

#[no_mangle]
pub unsafe extern "C" fn make_foo() -> *mut Foo {
   ...
}
  • rustc and cargo are aware of different crate-types, selecting staticlib produces a valid library
[lib]
name = "rav1e"
crate-type = ["staticlib"]
  • cbindgen can produce a usable C-header from a crate using few lines of build.rs or a stand-alone applet and a toml configuration file.
extern crate cbindgen;

fn main() {
    let crate_dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();
    let header_path: std::path::PathBuf = ["include", "rav1e.h"].iter().collect();

    cbindgen::generate(crate_dir).unwrap().write_to_file(header_path);

    println!("cargo:rerun-if-changed=src/lib.rs");
    println!("cargo:rerun-if-changed=cbindgen.toml");
}
header = "// SPDX-License-Identifier: MIT"
sys_includes = ["stddef.h"]
include_guard = "RAV1E_H"
tab_width = 4
style = "Type"
language = "C"

[parse]
parse_deps = true
include = ['rav1e']
expand = ['rav1e']

[export]
prefix = "Ra"
item_types = ["enums", "structs", "unions", "typedefs", "opaque", "functions"]

[enum]
rename_variants = "ScreamingSnakeCase"
prefix_with_name = true

Now issuing cargo build --release will get you a .h in the include/ dir and a .a library in target/release, so far it is simple enough.

What sort of works

Once have a static library, you need an external mean to track what are its dependencies.

Back in the old ages there were libtool archives (.la), now we have pkg-config files providing more information and in a format that is way easier to parse and use.

rustc has --print native-static-libs to produce the additional libraries to link, BUT prints it to stderr and only as a side-effect of the actual build process.

My, fairly ugly, hack had been adding a dummy empty subcrate just to produce the link-line using

cargo rustc -- --print native-static-libs 2>&1| grep native-static-libs | cut -d ':' -f 3

And then generate the .pc file from a template.

This is anything but straightforward and because how cargo rustc works, you may end up adding an empty subcrate just to extract this information quickly.

What is missing

Once you have your library, your header and your pkg-config file, you probably would like to install the library somehow and/or make a package out of it.

cargo install does not currently cover it. It works only for binaries and just binaries alone. It will hopefully change, but right now you just have to pick the external build system you are happy with and hack your way to integrate the steps mentioned above.

For crav1e I ended up hacking a quite crude Makefile.

And with that at least a pure-rust static library can be built and installed with the common:

make DESTDIR=/staging/place prefix=/usr libdir=/usr/lib64

Dynamic libraries

Given rustc and cargo have the cdylib crate type, one would assume we could just add the type, modify our build-system contraption a little and go our merry way.

Sadly not. A dynamic library (or shared object) requires in most of the common platform some additional metadata to guide the runtime linker.

The current widespread practice is to use tools such as patchelf or install_name_tool, but it is quite brittle and might require tools.

My plans for the 2019

rustc has a mean to pass the information to the compile-time linker but there is no proper way to pass it in cargo, I already tried to provide a solution, but I’ll have to go through the rfc route to make sure there is community consensus and the feature is properly documented.

Since kind of metadata is platform-specific so would be better to have this information produced and passed on by something external to the main cargo. Having it as applet or a build.rs dependency makes easier to support more platforms little by little and have overrides without having to go through a main cargo update.

The applet could also take care of properly create the .pc file and installing since it would have access to all the required information.

Some efforts could be also put on streamlining the process of extracting the library link line for the static data and spare some roundtrips.

I guess that’s all for what I’d really like to have next year in rust and I’m confident I’ll have time to deliver myself 🙂

December 06, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
Scylla Summit 2018 write-up (December 06, 2018, 22:53 UTC)

It’s been almost one month since I had the chance to attend and speak at Scylla Summit 2018 so I’m relieved to finally publish a short write-up on the key things I wanted to share about this wonderful event!

Make Scylla boring

This statement of Glauber Costa sums up what looked to me to be the main driver of the engineering efforts put into Scylla lately: making it work so consistently well on any kind of workload that it’s boring to operate 🙂

I will follow up on this statement to highlight the things I heard and (hopefully) understood during the summit. I hope you’ll find it insightful.

Reduced operational efforts

The thread-per-core and queues design still has a lot of possibilities to be leveraged.

The recent addition of RPC streaming capabilities to seastar allows a drastic reduction in the time it takes the cluster to grow or shrink (data rebalancing / resynchronization).

Incremental compaction is also very promising as this background process is one of the most expensive there is in the database’s design.

I was happy to hear that scylla-manager will soon be made available and free to use with basic features while retaining more advanced ones for enterprise version (like backup/restore).
I also noticed that the current version was not supporting SSL enabled clusters to store its configuration. So I directly asked Michał for it and I’m glad that it will be released on version 1.3.1.

Performant multi-tenancy

Why choose between real-time OLTP & analytics OLAP workloads?

The goal here is to be able to run both on the same cluster by giving users the ability to assign “SLA” shares to ROLES. That’s basically like pools on Hadoop at a much finer grain since it will create dedicated queues that will be weighted by their share.

Having one queue per usage and full accounting will allow to limit resources efficiently and users to have their say on their latency SLAs.

But Scylla also has a lot to do in the background to run smoothly. So while this design pattern was already applied to tamper compactions, a lot of work has also been done on automatic flow control and back pressure.

For instance, Materialized Views are updated asynchronously which means that while we can interact and put a lot of pressure on the table its based on (called the Main Table), we could overwhelm the background work that’s needed to keep MVs View Tables in sync. To mitigate this, a smart back pressure approach was developed and will throttle the clients to make sure that Scylla can manage to do everything at the best performance the hardware allows!

I was happy to hear that work on tiered storage is also planned to better optimize disk space costs for certain workloads.

Last but not least, columnar storage optimized for time series and analytics workloads are also something the developers are looking at.

Latency is expensive

If you care for latency, you might be happy to hear that a new polling API (named IOCB_CMD_POLL) has been contributed by Christoph Hellwig and Avi Kivity to the 4.19 Linux kernel which avoids context switching I/O by using a shared ring between kernel and userspace. Scylla will be using it by default if the kernel supports it.

The iotune utility has been upgraded since 2.3 to generate an enhanced I/O configuration.

Also, persistent (disk backed) in-memory tables are getting ready and are very promising for latency sensitive workloads!

A word on drivers

ScyllaDB has been relying on the Datastax drivers since the start. While it’s a good thing for the whole community, it’s important to note that the shard-per-CPU approach on data that Scylla is using is not known and leveraged by the current drivers.

Discussions took place and it seems that Datastax will not allow the protocol to evolve so that drivers could discover if the connected cluster is shard aware or not and then use this information to be more clever in which write/read path to use.

So for now ScyllaDB has been forking and developing their shard aware drivers for Java and Go (no Python yet… I was disappointed).

Kubernetes & containers

The ScyllaDB guys of course couldn’t avoid the Kubernetes frenzy so Moreno Garcia gave a lot of feedback and tips on how to operate Scylla on docker with minimal performance degradation.

Kubernetes has been designed for stateless applications, not stateful ones and Docker does some automatic magic that have rather big performance hits on Scylla. You will basically have to play with affinities to dedicate one Scylla instance to run on one server with a “retain” reclaim policy.

Remember that the official Scylla docker image runs with dev-mode enabled by default which turns off all performance checks on start. So start by disabling that and look at all the tips and literature that Moreno has put online!

Scylla 3.0

A lot has been written on it already so I will just be short on things that important to understand in my point of view.

  • Materialized Views do back fill the whole data set
    • this job is done by the view building process
    • you can watch its progress in the system_distributed.view_build_status table
  • Secondary Indexes are Materialized Views under the hood
    • it’s like a reverse pointer to the primary key of the Main Table
    • so if you read the whole row by selecting on the indexed column, two reads will be issued under the hood: one on the indexed MV view table to get the primary key and one on the main table to get the rest of the columns
    • so if your workload is mostly interested by the whole row, you’re better off creating a complete MV to read from than using a SI
    • this is even more true if you plan to do range scans as this double query could lead you to read from multiple nodes instead of one
  • Range scan is way more performant
    • ALLOW FILTERING finally allows a great flexibility by providing server-side filtering!

Random notes

Support for LWT (lightweight transactions) will be relying on a future implementation of the Raft consensus algorithm inside Scylla. This work will also benefits Materialized Views consistency. Duarte Nunes will be the one working on this and I envy him very much!

Support for search workloads is high in the ScyllaDB devs priorities so we should definitely hear about it in the coming months.

Support for “mc” sstables (new generation format) is done and will reduce storage requirements thanks to metadata / data compression. Migration will be transparent because Scylla can read previous formats as well so it will upgrade your sstables as it compacts them.

ScyllaDB developers have not settled on how to best implement CDC. I hope they do rather soon because it is crucial in their ability to integrate well with Kafka!

Materialized Views, Secondary Indexes and filtering will benefit from the work on partition key and indexes intersections to avoid server side filtering on the coordinator. That’s an important optimization to come!

Last but not least, I’ve had the pleasure to discuss with Takuya Asada who is the packager of Scylla for RedHat/CentOS & Debian/Ubuntu. We discussed Gentoo Linux packaging requirements as well as the recent and promising work on a relocatable package. We will collaborate more closely in the future!

November 25, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
Portability of tar features (November 25, 2018, 14:26 UTC)

The tar format is one of the oldest archive formats in use. It comes as no surprise that it is ugly — built as layers of hacks on the older format versions to overcome their limitations. However, given the POSIX standarization in late 80s and the popularity of GNU tar, you would expect the interoperability problems to be mostly resolved nowadays.

This article is directly inspired by my proof-of-concept work on new binary package format for Gentoo. My original proposal used volume label to provide user- and file(1)-friendly way of distinguish our binary packages. While it is a GNU tar extension, it falls within POSIX ustar implementation-defined file format and you would expect that non-compliant implementations would extract it as regular files. What I did not anticipate is that some implementation reject the whole archive instead.

This naturally raised more questions on how portable various tar formats actually are. To verify that, I have decided to analyze the standards for possible incompatibility dangers and build a suite of test inputs that could be used to check how various implementations cope with that. This article describes those points and provides test results for a number of implementations.

Please note that this article is focused merely on read-wise format compatibility. In other words, it establishes how tar files should be written in order to achieve best probability that it will be read correctly afterwards. It does not investigate what formats the listed tools can write and whether they can correctly create archives using specific features.

Continue reading

November 16, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

So I recently had a problem, where postgresql would run out of max concurrent connections .. and I wasn'T sure what caused it.

So to find out what the problem was I wanted to know what connections were open. After a short search I found the pg_stat_activity table.

of course most info in there is not needed for my case (it has database id, name, pid, usename, application_name, client_addr, state, ...)

but for me this was all I needed:

postgres=# select count(*), datname,state,pid from pg_stat_activity group by datname, state, pid order by datname;
 count |  datname   |        state        |  pid
-------+------------+---------------------+-------
     1 | dbmail     | idle                | 30092
     1 | dbmail     | idle                | 30095
..

or shorter just the connections by state and db

postgres=# select count(*), datname,state from pg_stat_activity group by datname, state order by datname;
 count | datname  |        state
-------+----------+---------------------
    15 | dbmail   | idle
..

of course one could go into more detail, but this made me realize that i could limit some processes that used a lot of connections, but are not heavy load. Really simple once you know where to look - as usual :)

November 13, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)

Over the year I contributed to an AV1 encoder written in rust.

Here a small tutorial about what is available right now, there is still lots to do, but I think we could enjoy more user-feedback (and possibly also some help).

Setting up

Install the rust toolchain

If you do not have rust installed, it is quite simple to get a full environment using rustup

$ curl https://sh.rustup.rs -sSf | sh
# Answer the questions asked and make sure you source the `.profile` file created.
$ source ~/.profile

Install cmake, perl and nasm

rav1e uses libaom for testing and and on x86/x86_64 some components have SIMD variants written directly using nasm.

You may follow the instructions, or just install:
nasm (version 2.13 or better)
perl (any recent perl5)
cmake (any recent version)

Once you have those dependencies in you are set.

Building rav1e

We use cargo, so the process is straightforward:

## Pull in the customized libaom if you want to run all the tests
$ git submodule update --init

## Build everything
$ cargo build --release

## Test to make sure everything works as intended
$ cargo test --features decode_test --release

## Install rav1e
$ cargo install

Using rav1e

Right now rav1e has a quite simple interface:

rav1e 0.1.0
AV1 video encoder

USAGE:
    rav1e [OPTIONS]  --output 

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -I, --keyint     Keyframe interval [default: 30]
    -l, --limit                  Maximum number of frames to encode [default: 0]
        --low_latency      low latency mode. true or false [default: true]
    -o, --output                Compressed AV1 in IVF video output
        --quantizer                 Quantizer (0-255) [default: 100]
    -r 
    -s, --speed                  Speed level (0(slow)-10(fast)) [default: 3]
        --tune                    Quality tuning (Will enforce partition sizes >= 8x8) [default: psnr]  [possible
                                        values: Psnr, Psychovisual]

ARGS:
        Uncompressed YUV4MPEG2 video input

It accepts y4m raw source and produces ivf files.

You can configure the encoder by setting the speed and quantizer levels.

The low_latency flag can be turned off to run some additional analysis over a set of frames and have additional quality gains.

Crav1e

While ave and gst-rs will use the rav1e crate directly, there are a number of software such as handbrake or vlc that would be much happier to consume a C API.

Thanks to the staticlib target and cbindgen is quite easy to produce a C-ABI library and its matching header.

Setup

crav1e is built using cargo, so nothing special is needed right now beside nasm if you are building it on x86/x86_64.

Build the library

This step is completely straightforward, you can build it as release:

$ cargo build --release

or as debug

$ cargo build

It will produce a target/release/librav1e.a or a target/debug/librav1e.a.
The C header will be in include/rav1e.h.

Try the example code

I provided a quite minimal sample case.

cc -Wall c-examples/simple_encoding.c -Ltarget/release/ -lrav1e -Iinclude/ -o c-examples/simple_encoding
./c-examples/simple_encoding

If it builds and runs correctly you are set.

Manually copy the .a and the .h

Currently cargo install does not work for our purposes, but it will change in the future.

$ cp target/release/librav1e.a /usr/local/lib
$ cp include/rav1e.h /usr/local/include/

Missing pieces

Right now crav1e works well enough but there are few shortcomings I’m trying to address.

Shared library support

The cdylib target does exist and produce a nearly usable library but there are some issues with soname support. I’m trying to address them with upstream, but it might take some time.

Meanwhile some people suggest to use patchelf or similar tools to fix the library after the fact.

Install target

cargo is generally awesome, but sadly its support for installing arbitrary files to arbitrary paths is limited, luckily there are people proposing solutions.

pkg-config file generation

I consider a library not proper if a .pc file is not provided with it.

Right now there are means to extract the information need to build a pkg-config file, but there isn’t a simple way to do it.

$ cargo rustc -- --print native-static-libs

Provides what is needed for Libs.private, ideally it should be created as part of the install step since you need to know the prefix, libdir and includedir paths.

Coming next

Probably the next blog post will be about my efforts to make cargo able to produce proper cdylib or something quite different.

PS: If somebody feels to help me with matroska in AV1 would be great 🙂

November 12, 2018
Hanno Böck a.k.a. hanno (homepage, bugs)

HackerOne is currently one of the most popular bug bounty program platforms. While the usual providers of bug bounty programs are companies, w while ago I noted that some people were running bug bounty programs on Hacker One for their private projects without payouts. It made me curious, so I decided to start one with some of my private web pages in scope.

The HackerOne process requires programs to be private at first, starting with a limited number of invites. Soon after I started the program the first reports came in. Not surprisingly I got plenty of false positives, which I tried to limit by documenting the scope better in the program description. I also got plenty of web security scanner payloads via my contact form. But more to my surprise I also got a number of very high quality reports.

S9YThis blog and two other sites in scope use Serendipity (also called S9Y), a blog software written in PHP. Through the bug bounty program I got reports for an Open Redirect, an XSS in the start page, an XSS in the back end, an SQL injection in the back end and another SQL injection in the freetag plugin. All of those were legitimate vulnerabilities in Serendipity and some of them quite severe. I forwarded the reports to the Serendipity developers.

Fixes are available by now, the first round of fixes were released with Serendipity 2.1.3 and another issue got fixed in 2.1.4. The freetag plugin was updated to version 2.69. If you use Serendipity please make sure you run the latest versions.

I'm not always happy with the way the bug bounty platforms work, yet it seems they have attracted an active community of security researchers who are also willing to occasionally look at projects without financial reward. While it's questionable when large corporations run bug bounty programs without rewards, I think that it's totally fine for private projects and volunteer-run free and open source projects.

The conclusion I take from this is that likely more projects should try to make use of the bug bounty community. Essentially Serendipity got a free security audit and is more secure now. It got this through the indirection of my personal bug bounty program, but of course this could also work directly. Free software projects could start their own bug bounty program, and when it's about web applications ideally they should have have a live installation of their own product in scope.

In case you find some security issue with my web pages I welcome reports. And special thanks to Brian Carpenter (Geeknik), Julio Cesar and oreamnos for making my blog more secure.

November 10, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
py3status v3.14 (November 10, 2018, 21:08 UTC)

I’m happy to announce this release as it contains some very interesting developments in the project. This release was focused on core changes.

IMPORTANT notice

There are now two optional dependencies to py3status:

  • gevent
    • will monkey patch the code to make it concurrent
    • the main benefit is to use an asynchronous loop instead of threads
  • pyudev
    • will enable a udev monitor if a module asks for it (only xrandr so far)
    • the benefit is described below

To install them all using pip, simply do:

pip install py3status[all]

Modules can now react/refresh on udev events

When pyudev is available, py3status will allow modules to subscribe and react to udev events!

The xrandr module uses this feature by default which allows the module to instantly refresh when you plug in or off a secondary monitor. This also allows to stop running the xrandr command in the background and saves a lot of CPU!

Highlights

  • py3status core uses black formatter
  • fix default i3status.conf detection
    • add ~/.config/i3 as a default config directory, closes #1548
    • add .config/i3/py3status in default user modules include directories
  • add markup (pango) support for modules (#1408), by @MikaYuoadas
  • py3: notify_user module name in the title (#1556), by @lasers
  • print module information to sdtout instead of stderr (#1565), by @robertnf
  • battery_level module: default to using sys instead of acpi (#1562), by @eddie-dunn
  • imap module: fix output formatting issue (#1559), by @girst

Thank you contributors!

  • eddie-dunn
  • girst
  • MikaYuoadas
  • robertnf
  • lasers
  • maximbaz
  • tobes

October 31, 2018
Arun Raghavan a.k.a. ford_prefect (homepage, bugs)
Update from the PipeWire hackfest (October 31, 2018, 15:49 UTC)

As the third and final day of the PipeWire hackfest draws to a close, I thought I’d summarise some of my thoughts on the goings-on and the future.

Thanks

Before I get into the details, I want to send out a big thank you to:

  • Christian Schaller for all the hard work of organising the event and Wim Taymans for the work on PipeWire so far (and in the future)
  • The GNOME Foundation, for sponsoring the event as a whole
  • Qualcomm, who are funding my presence at the event
  • Collabora, for sponsoring dinner on Monday
  • Everybody who attended and participate for their time and thoughtful comments

Background

For those of you who are not familiar with it, PipeWire (previously Pinos, previously PulseVideo) was Wim’s effort at providing secure, multi-program access to video devices (like webcams, or the desktop for screen capture). As he went down that rabbit hole, he wrote SPA, a lightweight general-purpose framework for representing a streaming graph, and this led to the idea of expanding the project to include support for low latency audio.

The Linux userspace audio story has, for the longest time, consisted of two top-level components: PulseAudio which handles consumer audio (power efficiency, wide range of arbitrary hardware), and JACK which deals with pro audio (low latency, high performance). Consolidating this into a good out-of-the-box experience for all use-cases has been a long-standing goal for myself and others in the community that I have spoken to.

An Opportunity

From a PulseAudio perspective, it has been hard to achieve the 1-to-few millisecond latency numbers that would be absolutely necessary for professional audio use-cases. A lot of work has gone into improving this situation, most recently with David Henningsson’s shared-ringbuffer channels that made client/server communication more efficient.

At the same time, as application sandboxing frameworks such as Flatpak have added security requirements of us that were not accounted for when PulseAudio was written. Examples including choosing which devices an application has access to (or can even know of) or which applications can act as control entities (set routing etc., enable/disable devices). Some work has gone into this — Ahmed Darwish did some key work to get memfd support in PulseAudio, and Wim has prototyped an access-control mechanism module to enable a Flatpak portal for sound.

All this said, there are still fundamental limitations in architectural decisions in PulseAudio that would require significant plumbing to address. With Wim’s work on PipeWire and his extensive background with GStreamer and PulseAudio itself, I think we have an opportunity to revisit some of those decisions with the benefit of a decade’s worth of learning deploying PulseAudio in various domains starting from desktops/laptops to phones, cars, robots, home audio, telephony systems and a lot more.

Key Ideas

There are some core ideas of PipeWire that I am quite excited about.

The first of these is the graph. Like JACK, the entities that participate in the data flow are represented by PipeWire as nodes in a graph, and routing between nodes is very flexible — you can route applications to playback devices and capture devices to applications, but you can also route applications to other applications, and this is notionally the same thing.

The second idea is a bit more radical — PipeWire itself only “runs” the graph. The actual connections between nodes are created and managed by a “session manager”. This allows us to completely separate the data flow from policy, which means we could write completely separate policy for desktop use cases vs. specific embedded use cases. I’m particularly excited to see this be scriptable in a higher-level language, which is something Bastien has already started work on!

A powerful idea in PulseAudio was rewinding — the ability to send out huge buffers to the device, but the flexibility to rewind that data when things changed (a new stream got added, or the stream moved, or the volume changed). While this is great for power saving, it is a significant amount of complexity in the code. In addition, with some filters in the data path, rewinding can break the algorithm by introducing non-linearity. PipeWire doesn’t support rewinds, and we will need to find a good way to manage latencies to account for low power use cases. One example is that we could have the session manager bump up the device latency when we know latency doesn’t matter (Android does this when the screen is off).

There are a bunch of other things that are in the process of being fleshed out, like being able to represent the hardware as a graph as well, to have a clearer idea of what is going on within a node. More updates as these things are more concrete.

The Way Forward

There is a good summary by Christian about our discussion about what is missing and how we can go about trying to make a smooth transition for PulseAudio users. There is, of course, a lot to do, and my ideal outcome is that we one day flip a switch and nobody knows that we have done so.

In practice, we’ll need to figure out how to make this transition seamless for most people, while folks with custom setup will need to be given a long runway and clear documentation to know what to do. It’s way to early to talk about this in more specifics, however.

Configuration

One key thing that PulseAudio does right (I know there are people who disagree!) is having a custom configuration that automagically works on a lot of Intel HDA-based systems. We’ve been wondering how to deal with this in PipeWire, and the path we think makes sense is to transition to ALSA UCM configuration. This is not as flexible as we need it to be, but I’d like to extend it for that purpose if possible. This would ideally also help consolidate the various methods of configuration being used by the various Linux userspaces.

To that end, I’ve started trying to get a UCM setup on my desktop that PulseAudio can use, and be functionally equivalent to what we do with our existing configuration. There are missing bits and bobs, and I’m currently focusing on the ones related to hardware volume control. I’ll write about this in the future as the effort expands out to other hardware.

Onwards and upwards

The transition to PipeWire is unlikely to be quick or completely-painless or free of contention. For those who are worried about the future, know that any switch is still a long way away. In the mean time, however, constructive feedback and comments are welcome.

October 18, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

We're happy to announce that our article "Lab::Measurement — a portable and extensible framework for controlling lab equipment and conducting measurements", describing our measurement software package Lab::Measurement, has been published in Computer Physics Communications.

Lab::Measurement is a collection of object-oriented Perl 5 modules for controlling lab instruments, performing measurements, and recording and plotting the resultant data. Its operating system independent driver stack makes it possible to use nearly identical measurement scripts both on Linux and Windows. Foreground operation with live plotting and background operation for, e.g., process control are supported. For more details, please read our article, visit the Lab::Measurement homepage, or visit Lab::Measurement on CPAN!

"Lab::Measurement - a portable and extensible framework for controlling lab equipment and conducting measurements"
S. Reinhardt, C. Butschkow, S. Geissler, A. Dirnaichner, F. Olbrich, C. Lane, D. Schröer, and A. K. Hüttel
Comp. Phys. Comm. 234, 216 (2019); arXiv:1804.03321 (PDF)

October 14, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
tryton -- ipython, proteus (October 14, 2018, 09:22 UTC)

So after being told on IRC that you can use (i)python and proteus to poke around a running tryton instance(thanks for that hint btw) I tried it and had some "fun" right away:
from proteus import config,Model
pcfg = config.set_trytond(database='trytond', config_file='/etc/tryon/trytond.conf')

gave me this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/lib64/python3.5/site-packages/trytond/backend/__init__.py in get(prop)
     31                 ep, = pkg_resources.iter_entry_points(
---> 32                     'trytond.backend', db_type)
     33             except ValueError:

ValueError: not enough values to unpack (expected 1, got 0)

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-2-300353cf02f5> in <module>()
----> 1 pcfg = config.set_trytond(database='trytond', config_file='/etc/tryon/trytond.conf')

/usr/lib64/python3.5/site-packages/proteus/config.py in set_trytond(database, user, config_file)
    281         config_file=None):
    282     'Set trytond package as backend'
--> 283     _CONFIG.current = TrytondConfig(database, user, config_file=config_file)
    284     return _CONFIG.current
    285

/usr/lib64/python3.5/site-packages/proteus/config.py in __init__(self, database, user, config_file)
    232         self.config_file = config_file
    233
--> 234         Pool.start()
    235         self.pool = Pool(database_name)
    236         self.pool.init()

/usr/lib64/python3.5/site-packages/trytond/pool.py in start(cls)
    100             for classes in Pool.classes.values():
    101                 classes.clear()
--> 102             register_classes()
    103             cls._started = True
    104

/usr/lib64/python3.5/site-packages/trytond/modules/__init__.py in register_classes()
    339     Import modules to register the classes in the Pool
    340     '''
--> 341     import trytond.ir
    342     trytond.ir.register()
    343     import trytond.res

/usr/lib64/python3.5/site-packages/trytond/ir/__init__.py in <module>()
      2 # this repository contains the full copyright notices and license terms.
      3 from ..pool import Pool
----> 4 from .configuration import *
      5 from .translation import *
      6 from .sequence import *

/usr/lib64/python3.5/site-packages/trytond/ir/configuration.py in <module>()
      1 # This file is part of Tryton.  The COPYRIGHT file at the top level of
      2 # this repository contains the full copyright notices and license terms.
----> 3 from ..model import ModelSQL, ModelSingleton, fields
      4 from ..cache import Cache
      5 from ..config import config

/usr/lib64/python3.5/site-packages/trytond/model/__init__.py in <module>()
      1 # This file is part of Tryton.  The COPYRIGHT file at the top level of
      2 # this repository contains the full copyright notices and license terms.
----> 3 from .model import Model
      4 from .modelview import ModelView
      5 from .modelstorage import ModelStorage, EvalEnvironment

/usr/lib64/python3.5/site-packages/trytond/model/model.py in <module>()
      6 from functools import total_ordering
      7
----> 8 from trytond.model import fields
      9 from trytond.error import WarningErrorMixin
     10 from trytond.pool import Pool, PoolBase

/usr/lib64/python3.5/site-packages/trytond/model/fields/__init__.py in <module>()
      2 # this repository contains the full copyright notices and license terms.
      3
----> 4 from .field import *
      5 from .boolean import *
      6 from .integer import *

/usr/lib64/python3.5/site-packages/trytond/model/fields/field.py in <module>()
     18 from ...rpc import RPC
     19
---> 20 Database = backend.get('Database')
     21
     22

/usr/lib64/python3.5/site-packages/trytond/backend/__init__.py in get(prop)
     32                     'trytond.backend', db_type)
     33             except ValueError:
---> 34                 raise exception
     35             mod_path = os.path.join(ep.dist.location,
     36                 *ep.module_name.split('.')[:-1])

/usr/lib64/python3.5/site-packages/trytond/backend/__init__.py in get(prop)
     24     if modname not in sys.modules:
     25         try:
---> 26             __import__(modname)
     27         except ImportError as exception:
     28             if not pkg_resources:

ImportError: No module named 'trytond.backend.'

Took me a while to figure out I just had a typon in the config file path. Since that cost me some time I thought I'd put it on here so that maybe someone else who makes the same mistake doesn't waste as much time on it as me ;) -- and thanks to the always helpful people on IRC #tryton@freenode

October 04, 2018
Nathan Zachary a.k.a. nathanzachary (homepage, bugs)
Austria: Trip Summary (October 04, 2018, 05:00 UTC)

Well, our 2018 trip to Austria, Slovenia, and Hungary ends today and we have to head back home, but not before one last moment of indulgence. We woke up early so that we could partake in the included breakfast at the hotel. As with everything else at the Aria Hotel, the breakfast was incredible! There was a full buffet of items, and we also were able to order some eggs. We each got an egg white omelette with some vegetables, had a couple breads, and ordered coffee / tea. I really enjoyed my croissant with some local fruit jams (especially the Apricot jam).

Vegetarian omelette at the Hotel Aria's complimentary breakfast
Vegetarian omelette at the Hotel Aria’s complimentary breakfast

The staff at the Aria brought up the car from the valet parking lot, brought down our bags from the room, and loaded them for us. The whole experience there made it the very best hotel that I have ever had the pleasure of staying at!

We checked out, and drove back to Budapest airport. Despite the bit of traffic leaving the city centre, it was quite easy to get to the airport, and everything was clearly marked for returning the rental car. We got through security and on the flight without any problems at all.

So, what were the Top 3s of the trip (in my opinion)?

FOOD (okay, so I had to have 4 for this category)

  1. Our main dish at Zum Kaiser von Österrich in the Wachau
  2. The salad with seeds and roasted walnuts at Weinhaus Attwenger in Bad Ischl
  3. The spinach dumplings at Sixta in Vienna
  4. The mushroom tartare at Kirchenwirt an der Weinstraße in Ehrenhausen

 
 
WINE

  1. Domäne Wachau’s Pinot Noir
  2. Domäne Wachau’s Kellerberg Riesling
  3. Weingut Tement’s Vinothek Reserve Sauvignon Blanc

 
 
EXPERIENCES

  1. The winery tours (Domäne Wachau, Schloss Gobelsburg, and Tement were amazing)
  2. Going up into the mountains of Hallstatt
  3. The entire experience that was the Aria Hotel Budapest—a music lover’s dream and simply the most amazing hotel I’ve ever seen!

CLIP OS logo ANSSI, the National Cybersecurity Agency of France, has released the sources of CLIP OS, that aims to build a hardened, multi-level operating system, based on the Linux kernel and a lot of free and open source software. We are happy to hear that it is based on Gentoo Hardened!

October 03, 2018
Nathan Zachary a.k.a. nathanzachary (homepage, bugs)

As in Südstiermark and many other places on this trip, we unfortunately only had one full day in the great city of Budapest. I had come up with a list of activities for us, and we sat down to figure out which ones we wanted to do (since there was absolutely no way to do everything in a mere 24 hours). We ended up spending much of the day just walking around and taking photos of the area. Our first spot for photos was right outside of our hotel at St. Stephen’s Basilica.

Budapest - St. Stephen's Basilica right outside of the Aria Hotel
Budapest – St. Stephen’s Basilica adjacent to the Aria Hotel

From there, we ventured across the Széchenyi Bridge to see an area known as The Fisherman’s Bastion (or Halászbástya in Hungarian). It’s a terrace near Matthias Church that is steeped in history and culture, and it also provides some beautiful views of the city. Down closer to the river, I think that I got some great shots of the Hungarian Parliament Building from a nice location on the western bank of the Danube.

Budapest - beautiful view of the Hungarian Parliament Building from west of the Danube
Budapest – beautiful view of the Hungarian Parliament Building from west of the Danube

We also wanted to “live it up” on the last night of our trip, so we asked the concierge for a recommendation of a bakery for cakes and treats. Café Gerbeaud came with the highest praises, so we walked to the neighbouring square to check it out. There were many stunning desserts to satisfy just about any type of sweet tooth! We couldn’t decide, so we ended up each getting a slice of three different cakes. Talk about a splurge!

Enjoying our desserts (The Dobos, Esterházy, and Émile cakes) from Café Gerbeaud
Our numerous desserts (The Dobos, Esterházy, and Émile cakes) from Café Gerbeaud

Right about that time, I received an email from one of the restaurants that I had contacted, and they were asking me to confirm our reservations. It was the first time that I had heard from them, so I didn’t think that my reservations had actually gone through. We now had a decision to make between the two restaurants, and I think that I chose poorly. More on that in just a little bit.

We wanted to walk to Városliget Park (the City Park) in order to just take some more photos and enjoy the day, but soon realised that we wouldn’t have the time necessary to get there and not feel rushed. So we ended up just looking in some of the shops along Andrássy street. Boggi had a storefront there, and I really like that Milanese designer, so we went in. I didn’t expect to, but I ended up purchasing a gorgeous sport shirt because it fit me like a glove! A bit impulsive, but sometimes things like that have to be done when on holiday.

We made it back to the Aria Hotel in time to experience the afternoon wine and piano reception (that we missed yesterday due to the travel problems). It was lovely to just sit in the music parlour and listen to the performance. We didn’t partake in any of the food because we had dinner reservations soon thereafter.

The afternoon reception in the music garden at the Aria Hotel
The afternoon reception in the music garden at the Aria Hotel

After that incredibly relaxing reception, we got ready and walked to dinner at Caviar & Bull. The food was over-the-top delicious, but we shared quite a few starters and just left without ordering any mains. If the food was that great, why would we leave without ordering entrées? Well, in my opinion, the prices were exorbitant for the portion size. We added it up, and the four starters came out to 10 bites per person. That being said, the food that we had was extremely creative and fun—like the molecular spheres:

Budapest - molecular sphere starter at Caviar & Bull
Budapest – molecular sphere starter at Caviar & Bull

On our walk back to the hotel, we realised that we needed some actual food, so we went to this little Japanese place called Wasabi Extra, which was directly across from our hotel. It was a conveyor belt sushi joint (all-you-can-eat), but we opted to just get some Japanese curry dishes. They were mediocre at best, but at least provided some sustenance.

We wandered back up to the room, and the hotel staff had delivered the wines they had been chilling for us in their walk-in. They also delivered the wine glasses and an ice bucket. Which wines did we choose for the last evening of our trip? Of course they had to be special, so we went with the 1995 vintage of the Domäne Wachau Kellerberg Riesling. We also opened a bottle of the 2017 vintage for comparison. It was a great experience, and one that we likely won’t be able to ever have again. That particular Riesling is my favourite of theirs, and arguably my favourite expression of the grape outside of Alsace and Germany. Having one with such bottle age transformed it into a golden yellow colour with aromas of overly ripe tropical fruits and petrol, along with the creamy mouthfeel that softens the typical blinding acidity of Riesling; it was a truly remarkable wine!

The perfect ending to a trip - enjoying Domäne Wachau's 1995 Ried Kellerberg Riesling and desserts from Café Gerbeaud
The perfect ending to a trip – enjoying Domäne Wachau’s 1995 Ried Kellerberg Riesling and desserts from Café Gerbeaud

We also had our desserts from Café Gerbeaud. They were all good, but I think that we agreed that the Émile was undoubtedly our favourite. That’s the one that Deb lovingly calls “the Pringle dessert” because of the chocolate garnish on the top that looks a bit like a Pringles crisp. A pretty darn good way to end a trip, if I do say so myself… and I do!

October 02, 2018
Nathan Zachary a.k.a. nathanzachary (homepage, bugs)

We woke up a bit early to check out of Weingut Tement, but before doing so had a tour of the facility with Monika Tement (the wife of Armin Tement, who, with his brother Stefan, is the current winemaker and proprietor of the estate). It was rainy and damp outside, so we couldn’t go through the vineyards. Thankfully I was able to get some photos of the beautiful Zieregg vineyard yesterday when the weather was nicer.

Südsteiermark - The stunning panorama of Ried Zieregg at Weingut Tement
Südsteiermark – The stunning panorama of Ried Zieregg at Weingut Tement

Even though we weren’t able to go through the vineyards together due to the rains, Monika improvised and shared so much incredible information about their land and winemaking practices. In their cellar, there is a portion where there isn’t a concrete wall, and one can see the open soil that comprises the Zeiregg STK Grand Cru vineyard site (so… very… cool!).

Südsteiermark - inside Weingut Tement's cellar with the wall exposing the soils of Ried Zieregg
Südsteiermark – inside Weingut Tement’s cellar with the wall exposing the soils of Ried Zieregg

Before leaving the cellar, we were fortunate enough to see brothers Armin and Stefan Tement checking the status of fermentation of many of the wines that were in-barrel. They were testing the sugar content, alcohol content, and various other components of the wine using instruments designed specifically for the tasks. Monika also told us about the story of the Cellar Cat that, according to lore, will choose the best barrel of wine and sit atop it. In this case, it chose wisely (or, truthfully, whomever placed this cat statue on the barrel chose wisely) by selecting a lovely barrel of Zieregg Grosse Lage STK Sauvignon Blanc.

Südsteiermark - the cellar cat chooses his barrel of Ried Zieregg Sauvignon Blanc at Weingut Tement
Südsteiermark – the cellar cat chooses his barrel of Ried Zieregg Sauvignon Blanc at Weingut Tement

We got in the car and headed out for what was the longest drive of the trip. Going from Südstiermark back to Budapest was supposed to be about 3.5 hours, but yet again, the GPS that we rented with the car was TERRIBLE. That problem, coupled with traffic, road construction, poor road conditions, and nearly running out of fuel resulted in the trip taking nearly 5.5 hours. We missed the afternoon wine and piano reception at the Aria Hotel, but at least didn’t miss out on the massage that I had scheduled. We had to cut it a little short as to not interfere with our dinner plans, but we still got to enjoy it.

Budapest - The custom-built grand piano in the music garden at the Hotel Aria
Budapest – The custom-built grand piano in the music garden at the Hotel Aria

After the massage, we freshened up and walked to our dinner reservations at Aszu, which was just two blocks over from the hotel. We started our meal by sharing a summer salad with carrots, and radishes, along with a Hungarian chicken pancake dish called Hortobágyi. We then decided to order three mains and just share them as well. We went with: 1) fresh pasta with mascarpone and spinach mousse, garlic, and dried tomatoes; 2) a farmhouse chicken breast with corn variations (including popcorn) and truffle pesto; 3) a pork shoulder with cauliflower cream, apricots, and yoghurt. After trying each of them, it so happened that Deb really liked the pork shoulder, and I preferred the pasta dish. So, we didn’t share those two, but only the farmhouse chicken. I had wanted to try one of their desserts, but we didn’t have time (the service was impeccable, but a bit slow) before our reservations back at the hotel’s rooftop Sky Bar.

I had arranged for a private violin soloist performance (since the Aria is known for its complete music theme), and it was absolutely astonishing! After that show, we had our own little table inside the High Note Sky Bar. It was cosy, and our waiter brought out our wines along with some complimentary baggies of popcorn. As I believe that one should always have the wines of the region, Deb had the 2016 Demeter Zoltán Szerelmi Hárslevelú, and I had the 2016 St. Andrea Áldás Egri Bikavér, which translates to “Bulls Blood”. It’s a mix of a lot of different grapes (in this case, Kékfrankos, Merlot, Cabernet Franc, Pinot Noir, Syrah, Cabernet Sauvignon, Kadarka, and Turán), and it was very interesting. I hope to never encounter it in a blind tasting, though, because it would be essentially impossible to identify. 😛 After that bottle, we each wanted one additional glass. Deb had a the 2016 István Szepsy Dry Furmint, and I went with the 2016 Etyeki Kúria Pinot Noir. Both were lovely, and I was surprised to find yet another gorgeous representation of cool climate Pinot!

We headed back downstairs to our beautiful room, but stopped to take one more look at the lovely terraces and music garden below.

Budapest - Hotel Aria's stunning music garden courtyard
Budapest – Hotel Aria’s stunning music garden courtyard

October 01, 2018
Nathan Zachary a.k.a. nathanzachary (homepage, bugs)

Unfortunately, as with so many of our stays in Austria, we only had one full day in the Ciringa / Südsteiermark region, so we had to make the best of it by seeing some of the cool attractions. We drove about 45 minutes or so away to the town of Riegersburg to visit their castle. If you walk up the hill to the top (instead of taking the funicular) and forego the castle museums, there are no fees involved. However, we opted to take the lift for €6 and to see all three museums for €13. The lift was rickety and a bit frightening, but we made it! The castle was really a neat experience, and the museums (one about the castle itself, one about witches, and one about medieval arms) were informative, but they were primarily in German. The English pamphlets only gave basic overviews of each room, so I feel like we missed out on a lot of the fascinating details. Though the castle experience was fun, I think it was a bit overpriced.

Südsteiermark - Riegersburg Castle entrance
Südsteiermark – Riegersburg Castle entrance

One of the most interesting aspects of the castle was the various ornate stoves in some of the rooms. I often forget that there was no such thing as central heating and cooling during these times, so it was certainly a must to have some form of heating throughout the castle during the winter months. These stoves likely provided ample heat for taking the chill out of the air, and at Riegersburg, they likely served as discussion pieces given their elaborate and intricate designs.

Südsteiermark - lovely tile stove inside Riegersburg Castle
Südsteiermark – lovely tile stove inside Riegersburg Castle

After going through the three museums, we spent a little time looking around the outside of the castle. The views of the surrounding areas were really beautiful and pastoral. Once finished with Riegersburg, we drove a little bit down the road to Zotter Schokoladen (a chocolate manufacturer) for a tour of their facility. It started with a really great video that outlined the chocolate making process beginning with harvesting the cacao pods. We then went through the factory with an English audio guide that explained every step of the process in a lot more detail.

Südsteiermark - one of many chocolate machines at Zotter Schokoladen
Südsteiermark – one of many chocolate machines at Zotter Schokoladen

During each stage of the chocolate production, we were able to taste the “chocolate”. I use the word “chocolate” loosely because at many of the stages in the process, it didn’t taste much like the chocolate that we’re all used to. We did, however, get the opportunity to taste a bunch of their finished products. Some were good, some were great, and a few were absolutely fantastic! Deb ended up getting this solid 72% Milk chocolate bar sourced from Peru, some white chocolate bark with pistachios and almonds, and we each bought one of the tasting spoons that we used throughout the tour. I didn’t buy anything because the one that I loved the most wasn’t available for purchase. It was called the White Goddess and was white chocolate with Tonka Beans and honey crisps. It looks like it’s available online, so I may consider it at some point. The other ones that I enjoyed were the coconut nougat and the white chocolate bar with coconut and raspberries. One aspect of Zotter that I really found fascinating was the number of vegan options that they had available.

Südsteiermark - some of the vegan offerings at Zotter Schokoladen
Südsteiermark – some of the vegan offerings at Zotter Schokoladen

At the tail end of the Zotter tour, there was a really great experience where they had large glass jars with various items that have rather distinct aromas (like rose petals, some baking spices such as cloves, and so on). The object of this particular hallway was to smell the contents of each jar and see if you could name the aroma without looking at the answer printed on the underside of the lid. Deb and I made it into a bit of a game by loosely keeping score, and I found it to be a lot of fun because many of the aromas that can be found in chocolate can also be found in red wines. As a side note, there was a really fun “chocolate bath” at the exit of the tour. Sadly, it was only for show, but I can imagine that chocoholics everywhere would swoon at the thought. 😛

Südsteiermark - the chocolate bathtub at Zotter Schokoladen
Südsteiermark – the chocolate bathtub at Zotter Schokoladen

The other portion of the Zotter tour is a farm / petting zoo, but darn the bad luck, it started raining so we didn’t get a chance to go through it. After Zotter, we went back to the same restaurant that we ate at the previous night (Kirchenwirt an der Weinstraße in Ehrenhausen) because we enjoyed it so much! We didn’t have the same waiter this time, and our waitress tonight spoke VERY little English. It made it more difficult to order, but everything came out like we wanted. We each started with the mushroom tartare (which was my favourite), and then Deb went with Wiener Schnitzel and I had a custom order similar to what she had the evening before. I ordered the Pork Medallions, but without the pork. I know, it sounds ridiculous, but I wanted the dish with just a boatload of trumpet mushrooms and some extra German pretzel dumplings. I ordered by using Google Translate on my mobile, and my custom dish came out just as I had intended. Success!

Südsteiermark - Ehrenhausen - Kirchenwirt an der Weinstraße - Mushroom tartare starter
Südsteiermark – Ehrenhausen – Kirchenwirt an der Weinstraße – Mushroom tartare starter

Südsteiermark - Ehrenhausen - Kirchenwirt an der Weinstraße - Trumpet mushrooms and pretzel dumplings
Südsteiermark – Ehrenhausen – Kirchenwirt an der Weinstraße – Trumpet mushrooms and pretzel dumplings

Back at the beautiful chalet, we enjoyed our wines of the evening. This time we went with two of the special, limited production wines from Weingut Tement. We wanted to compare two of their higher-end Sauvignon Blancs, so we had a bottle of the 2012 Zieregg “IZ” Reserve and a bottle of the 2015 Zieregg Vinothek Reserve. I thought that Deb would like the Vinothek and that I would like the “IZ” (which is made via a process similar to carbonic maceration [often used in Beaujolais]), but I had it completely backwards. I preferred the Vinothek and Deb liked the “IZ” more. I found the Vinothek to be a more pure expression of the grape and the place, which are two aspects that I highly value in wine.

September 30, 2018
Nathan Zachary a.k.a. nathanzachary (homepage, bugs)

Today we woke up extra early (before the sun had even peaked over the mountain crest) to depart Hallstatt for Maribor, Slovenia. The reason for getting up before the rooster’s crow is that it’s a special day in Maribor—the annual Harvest Festival of the oldest grape vine in the world. We started out for the ~3-hour drive, but met a problem right off the bat in that the road leaving Obertraun toward Graz was closed due to an avalanche. Yes, an avalanche… let’s not forget that we’re in the Austrian Alps at the end of autumn. I had to figure out an alternative route, but fortunately we still made it to Maribor in time. Actually, right as we arrived in the city centre, we pulled in behind the pre-festival wagon complete with an accordion player! We parked the car, and saw the event from a fairly nice perspective on the side line.

Maribor - Harvest Festival - Pre-show celebration
Maribor – Harvest Festival – Pre-show celebration

Being an absolute wine fanatic, and one with a strong interest in viticulture and oenology, I geeked out a little bit at the Harvest Festival because it is the oldest fruit-bearing grapevine on the planet! Not only that, but we just happened to be heading to southeastern Austria on the same day; a perfect coincidence. The festival started with members of the Slovenian Wine Council (formally known as the PSVVS [the Business Association for Viticulture and Wine Production]) speaking to the quality of the country’s various wine regions. It was wonderful to see them take such pride in their indigenous grapes and wines!

Maribor - Harvest Festival - Slovenian Wine Council
Maribor – Harvest Festival – Slovenian Wine Council

After the speakers (including diplomats and industry representatives from foreign nations) discussed the impact of Slovenian wines on the global marketplace, the festivities continued with live music, dancers wearing traditional garb, and importantly, the ceremonial first cutting of the grapes. We didn’t stay too much after the first cutting as most of the activities were in Slovenian and likely lost in translation, but I’m glad that we were there to see it firsthand; it was very likely a once-in-a-lifetime experience.

Maribor - Harvest Festival - first cutting of the grapes
Maribor – Harvest Festival – first cutting of the grapes

After the Harvest Festival, we went to Mestni Park (meaning “City Park”) so that we could climb to the top of Piramida Hill. It’s a high ground and, though not anything like the mountains we just saw in Hallstatt, it has quite a steep grade. The top of Piramida is considered one of the best views of the city. It was a fun hike, and the views certainly were impressive, so I’m glad that we took the time to do it. However, since there was a minimum of €15 for the Vignette pass (for driving on Slovenian motorways), it seemed a bit expensive just for the few hours of the festival and the park. Nevertheless, it was a good experience.

View of Maribor from atop Piramida at Mestni Park
View of Maribor from atop Piramida at Mestni Park

As it was mid-afternoon, we then got back in the car and drove up to the Slovenian-Austrian border for our stay at Weingut Tement. Tement offers a few different accommodation options, and we actually stayed on the only part of their property that is technically in Slovenia (the Winzarei Ciringa chalets) instead of on the Austrian side of the border. We had a lovely reception where we were able to taste some of their wines, and then saw our gorgeous chalet.

Südsteiermark - Weingut Tement's Chalet Ciringa - Living room
Südsteiermark – Weingut Tement’s Chalet Ciringa – Living room

There was a sizeable bedroom, full kitchen, extremely luxurious bathroom, and a lovely little breakfast nook before walking out the door to the patio. From our patio, we could readily see some of Tement’s vineyards, and even though they weren’t their esteemed Grosse Lage STK Zieregg vineyards, they were beautiful nonetheless.

Südsteiermark - Weingut Tement's Chalet Ciringa - our breakfast nook
Südsteiermark – Weingut Tement’s Chalet Ciringa – our breakfast nook

Südsteiermark - Weingut Tement's Chalet Ciringa - beautiful view from the patio
Südsteiermark – Weingut Tement’s Chalet Ciringa – fantastic vineyard view from the patio

We spent a little time just walking the Zieregg Vineyard (adjacent to the winery itself), and then headed to Ehrenhausen for dinner at Die Weinbank, which is directly affiliated with Weingut Tement. Unfortunately, when we arrived, it was closed despite the confirmation of our reservations. I looked on my mobile and found that there was one other restaurant named Kirchenwirt an der Weinstraße a mere block away from our car park, so we went there instead. We were expecting pub food, but boy were we wrong! It was elevated and outstanding, and our waiter was extremely accommodating by reading the entire menu to us in English. Deb and I shared some pumpkin soup, a salad with pumpkin, and mushroom tartare. She then ordered pork cutlets with trumpet mushrooms, and I went with pesto linguine with vegetables and, yup, more pumpkin. We ordered a couple pieces of house-made apple strudel to take away with us for later.

Back at our chalet, we enjoyed our wines of the evening. We each had the current vintage (2016) of Weingut Tement Zieregg Morillon (which is the local name for Chardonnay). It was a lovely mix of styles (not heavily oaky like many California Chards, but not as sharply crisp as Chablis either) and exhibited a character all of their own. The apple strudel was interesting, but I personally found it to be a bit like apple sauce inside instead of a strudel filling. It might have been better at the restaurant where it would be served warm and with vanilla ice cream, but neither of us like to have sweets before wine.

September 28, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
Tryton Module Development (September 28, 2018, 12:03 UTC)

So I've finally got around to really start Tryton module dev to customize it to what we need.

I plan to put stuff that is useful as examples or maybe directly as-is on my github: https://github.com/LordVan/tryton-modules

On a side note this is trytond-4.8.4 running on python 3.5 at the moment.

The first module just (re-)adds the description filed to the sale lines in the sale module (entry). This by itself is vaguely useful for me but mostly was to figure out how this works. I have to say once figured out it is really easy - the hardest part was to get the XML right for someone who is not familiar with the structure. I'd like to thank the people who helped me on IRC ( #tryton@freenode )

The next step will be to add some custom fields to this and products.

To add this module you can follow the steps in the documentation: Tryton by example

Alexys Jacob a.k.a. ultrabug (homepage, bugs)
py3status v3.13 (September 28, 2018, 11:56 UTC)

I am once again lagging behind the release blog posts but this one is an important one.

I’m proud to announce that our long time contributor @lasers has become an official collaborator of the py3status project!

Dear @lasers, your amazing energy and overwhelming ideas have served our little community for a while. I’m sure we’ll have a great way forward as we learn to work together with @tobes 🙂 Thank you again very much for everything you do!

This release is as much dedicated to you as it is yours 🙂

IMPORTANT notice

After this release, py3status coding style CI will enforce the ‘black‘ formatter style.

Highlights

Needless to say that the changelog is huge, as usual, here is a very condensed view:

  • documentation updates, especially on the formatter (thanks @L0ric0)
  • py3 storage: use $XDG_CACHE_HOME or ~/.cache
  • formatter: multiple variable and feature fixes and enhancements
  • better config parser
  • new modules: lm_sensors, loadavg, mail, nvidia_smi, sql, timewarrior, wanda_the_fish

Thank you contributors!

  • lasers
  • tobes
  • maximbaz
  • cyrinux
  • Lorenz Steinert @L0ric0
  • wojtex
  • horgix
  • su8
  • Maikel Punie

September 27, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
New copyright policy explained (September 27, 2018, 06:47 UTC)

On 2018-09-15 meeting, the Trustees have given the final stamp of approval to the new Gentoo copyright policy outlined in GLEP 76. This policy is the result of work that has been slowly progressing since 2005, and that has taken considerable speed by the end of 2017. It is a major step forward from the status quo that has been used since the forming of Gentoo Foundation, and that mostly has been inherited from earlier Gentoo Technologies.

The policy aims to cover all copyright-related aspects, bringing Gentoo in line with the practices used in many other large open source projects. Most notably, it introduces a concept of Gentoo Certificate of Origin that requires all contributors to confirm that they are entitled to submit their contributions to Gentoo, and corrects the copyright attribution policy to be viable under more jurisdictions.

This article aims to shortly reiterate over the most important points in the new copyright policy, and provide a detailed guide on following it in Q&A form.

Continue reading

September 15, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

With Qt5 gaining support for high-DPI displays, and applications starting to exercise that support, it’s easy for applications to suddenly become unusable with some screens. For example, my old Samsung TV reported itself as 7″ screen. While this used not to really matter with websites forcing you to force the resolution of 96 DPI, the high-DPI applications started scaling themselves to occupy most of my screen, with elements becoming really huge (and ugly, apparently due to some poor scaling).

It turns out that it is really hard to find a solution for this. Most of the guides and tips are focused either on proprietary drivers or on getting custom resolutions. The DisplaySize specification in xorg.conf apparently did not change anything either. Finally, I was able to resolve the issue by overriding the EDID data for my screen. This guide explains how I did it.

Step 1: dump EDID data

Firstly, you need to get the EDID data from your monitor. Supposedly read-edid tool could be used for this purpose but it did not work for me. With only a little bit more effort, you can get it e.g. from xrandr:

$ xrandr --verbose
[...]
HDMI-0 connected primary 1920x1080+0+0 (0x57) normal (normal left inverted right x axis y axis) 708mm x 398mm
[...]
  EDID:
    00ffffffffffff004c2dfb0400000000
    2f120103804728780aee91a3544c9926
    0f5054bdef80714f8100814081809500
    950fb300a940023a801871382d40582c
    4500c48e2100001e662150b051001b30
    40703600c48e2100001e000000fd0018
    4b1a5117000a2020202020200000000a
    0053414d53554e470a20202020200143
    020323f14b901f041305140312202122
    2309070783010000e2000f67030c0010
    00b82d011d007251d01e206e285500c4
    8e2100001e011d00bc52d01e20b82855
    40c48e2100001e011d8018711c162058
    2c2500c48e2100009e011d80d0721c16
    20102c2580c48e2100009e0000000000
    00000000000000000000000000000029
[...]

If you have multiple displays connected, make sure to use the EDID for the one you’re overriding. Copy the hexdump and convert it to a binary blob. You can do this by passing it through xxd -p -r (installed by vim).

Step 2: fix screen dimensions

Once you have the EDID blob ready, you need to update the screen dimensions inside it. Initially, I did it using hex editor which involved finding all the occurrences, updating them (and manually encoding into the weird split-integers) and correcting the checksums. Then, I’ve written edid-fixdim so you wouldn’t have to repeat that experience.

First, use --get option to verify that your EDID is supported correctly:

$ edid-fixdim -g edid.bin
EDID structure: 71 cm x 40 cm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
CEA EDID found
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm
Detailed timing desc: 708 mm x 398 mm

So your EDID consists of basic EDID structure, followed by one extension block. The screen dimensions are stored in 7 different blocks you’d have to update, and referenced in two checksums. The tool will take care of updating it all for you, so just pass the correct dimensions to --set:

$ edid-fixdim -s 1600x900 edid.bin
EDID structure updated to 160 cm x 90 cm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
CEA EDID found
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm
Detailed timing desc updated to 1600 mm x 900 mm

Afterwards, you can use --get again to verify that the changes were made correctly.

Step 3: overriding EDID data

Now it’s just the matter of putting the override in motion. First, make sure to enable CONFIG_DRM_LOAD_EDID_FIRMWARE in your kernel:

Device Drivers  --->
  Graphics support  --->
    Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)  --->
      [*] Allow to specify an EDID data set instead of probing for it

Then, determine the correct connector name. You can find it in dmesg output:

$ dmesg | grep -C 1 Connector
[   15.192088] [drm] ib test on ring 5 succeeded
[   15.193461] [drm] Radeon Display Connectors
[   15.193524] [drm] Connector 0:
[   15.193580] [drm]   HDMI-A-1
--
[   15.193800] [drm]     DFP1: INTERNAL_UNIPHY1
[   15.193857] [drm] Connector 1:
[   15.193911] [drm]   DVI-I-1
--
[   15.194210] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[   15.194267] [drm] Connector 2:
[   15.194322] [drm]   VGA-1

Copy the new EDID blob into location of your choice inside /lib/firmware:

$ mkdir /lib/firmware/edid
$ cp edid.bin /lib/firmware/edid/samsung.bin

Finally, add the override to your kernel command-line:

drm.edid_firmware=HDMI-A-1:edid/samsung.bin

If everything went fine, xrandr should report correct screen dimensions after next reboot, and dmesg should report that EDID override has been loaded:

$ dmesg | grep EDID
[   15.549063] [drm] Got external EDID base block and 1 extension from "edid/samsung.bin" for connector "HDMI-A-1"

If it didn't, check dmesg for error messages.

September 09, 2018
Sven Vermeulen a.k.a. swift (homepage, bugs)
cvechecker 3.9 released (September 09, 2018, 11:04 UTC)

Thanks to updates from Vignesh Jayaraman, Anton Hillebrand and Rolf Eike Beer, a new release of cvechecker is now made available.

This new release (v3.9) is a bugfix release.

September 08, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
SIP & STUN .. (September 08, 2018, 07:26 UTC)

Note to self .. it is not very useful when one leaves a (public) STUN server activated in a SIP client after changing it from using the VoIP Server's IP to the (internal) DNS .. leads to working signalling, but no audio ^^ - Took me a few days to figure out what had happened (including capturing stuff with Wireshark, ..)

September 07, 2018
Gentoo congratulates our GSoC participants (September 07, 2018, 00:00 UTC)

GSOC logo Gentoo would like to congratulate Gibix and JSteward for finishing and passing Google’s Summer of Code for the 2018 calendar year. Gibix contributed by enhancing Rust (programming language) support within Gentoo. JSteward contributed by making a full Gentoo GNU/Linux distribution, managed by Portage, run on devices which use the original Android-customized kernel.

The final reports of their projects can be reviewed on their personal blogs:

August 24, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

I have recently worked on enabling 2-step authentication via SSH on the Gentoo developer machine. I have selected google-authenticator-libpam amongst different available implementations as it seemed the best maintained and having all the necessary features, including a friendly tool for users to configure it. However, its design has a weakness: it stores the secret unprotected in user’s home directory.

This means that if an attacker manages to gain at least temporary access to the filesystem with user’s privileges — through a malicious process, vulnerability or simply because someone left the computer unattended for a minute — he can trivially read the secret and therefore clone the token source without leaving a trace. It would completely defeat the purpose of the second step, and the user may not even notice until the attacker makes real use of the stolen secret.

In order to protect against this, I’ve created google-authenticator-wrappers (as upstream decided to ignore the problem). This package provides a rather trivial setuid wrapper that manages a write-only, authentication-protected secret store for the PAM module. Additionally, it comes with a test program (so you can test the OTP setup without jumping through the hoops or risking losing access) and friendly wrappers for the default setup, as used on Gentoo Infra.

The recommended setup (as utilized by sys-auth/google-authenticator-wrappers package) is to use a dedicated user for the password store. In this scenario, the users are unable to read their secrets, and all secret operations (including authentication via the PAM module) are done using an unprivileged user. Furthermore, any operation regarding the configuration (either updating it or removing the second step) require regular PAM authentication (e.g. typing your own password).

This is consistent with e.g. how shadow operates (users can’t read their passwords, nor update them without authenticating first), how most sites using 2-factor authentication operate (again, users can’t read their secrets) and follows the RFC 6238 recommendation (that keys […] SHOULD be protected against unauthorized access and usage). It solves the aforementioned issue by preventing user-privileged processes from reading the secrets and recovery codes. Furthermore, it prevents the attacker with this particular level of access from disabling 2-step authentication, changing the secret or even weakening the configuration.

August 17, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)
Gentoo on Integricloud (August 17, 2018, 22:44 UTC)

Integricloud gave me access to their infrastructure to track some issues on ppc64 and ppc64le.

Since some of the issues are related to the compilers, I obviously installed Gentoo on it and in the process I started to fix some issues with catalyst to get a working install media, but that’s for another blogpost.

Today I’m just giving a walk-through on how to get a ppc64le (and ppc64 soon) VM up and running.

Preparation

Read this and get your install media available to your instance.

Install Media

I’m using the Gentoo installcd I’m currently refining.

Booting

You have to append console=hvc0 to your boot command, the boot process might figure it out for you on newer install medias (I still have to send patches to update livecd-tools)

Network configuration

You have to manually setup the network.
You can use ifconfig and route or ip as you like, refer to your instance setup for the parameters.

ifconfig enp0s0 ${ip}/16
route add -net default gw ${gw}
echo "nameserver 8.8.8.8" > /etc/resolv.conf
ip a add ${ip}/16 dev enp0s0
ip l set enp0s0 up
ip r add default via ${gw}
echo "nameserver 8.8.8.8" > /etc/resolv.conf

Disk Setup

OpenFirmware seems to like gpt much better:

parted /dev/sda mklabel gpt

You may use fdisk to create:
– a PowerPC PrEP boot partition of 8M
– root partition with the remaining space

Device     Start      End  Sectors Size Type
/dev/sda1   2048    18431    16384   8M PowerPC PReP boot
/dev/sda2  18432 33554654 33536223  16G Linux filesystem

I’m using btrfs and zstd-compress /usr/portage and /usr/src/.

mkfs.btrfs /dev/sda2

Initial setup

It is pretty much the usual.

mount /dev/sda2 /mnt/gentoo
cd /mnt/gentoo
wget https://dev.gentoo.org/~mattst88/ppc-stages/stage3-ppc64le-20180810.tar.xz
tar -xpf stage3-ppc64le-20180810.tar.xz
mount -o bind /dev dev
mount -t devpts devpts dev/pts
mount -t proc proc proc
mount -t sysfs sys sys
cp /etc/resolv.conf etc
chroot .

You just have to emerge grub and gentoo-sources, I diverge from the defconfig by making btrfs builtin.

My /etc/portage/make.conf:

CFLAGS="-O3 -mcpu=power9 -pipe"
# WARNING: Changing your CHOST is not something that should be done lightly.
# Please consult https://wiki.gentoo.org/wiki/Changing_the_CHOST_variable beforee
 changing.
CHOST="powerpc64le-unknown-linux-gnu"

# NOTE: This stage was built with the bindist Use flag enabled
PORTDIR="/usr/portage"
DISTDIR="/usr/portage/distfiles"
PKGDIR="/usr/portage/packages"

USE="ibm altivec vsx"

# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C
ACCEPT_KEYWORDS=~ppc64

MAKEOPTS="-j4 -l6"
EMERGE_DEFAULT_OPTS="--jobs 10 --load-average 6 "

My minimal set of packages I need before booting:

emerge grub gentoo-sources vim btrfs-progs openssh

NOTE: You want to emerge again openssh and make sure bindist is not in your USE.

Kernel & Bootloader

cd /usr/src/linux
make defconfig
make menuconfig # I want btrfs builtin so I can avoid a initrd
make -j 10 all && make install && make modules_install
grub-install /dev/sda1
grub-mkconfig -o /boot/grub/grub.cfg

NOTE: make sure you pass /dev/sda1 otherwise grub will happily assume OpenFirmware knows about btrfs and just point it to your directory.
That’s not the case unfortunately.

Networking

I’m using netifrc and I’m using the eth0-naming-convention.

touch /etc/udev/rules.d/80-net-name-slot.rules
ln -sf /etc/init.d/net.{lo,eth0}
echo -e "config_eth0=\"${ip}/16\"\nroutes_eth0="default via ${gw}\"\ndns_servers_eth0=\"8.8.8.8\"" > /etc/conf.d/net

Password and SSH

Even if the mticlient is quite nice, you would rather use ssh as much as you could.

passwd 
rc-update add sshd default

Finishing touches

Right now sysvinit does not add the hvc0 console as it should due to a profile quirk, for now check /etc/inittab and in case add:

echo 'hvc0:2345:respawn:/sbin/agetty -L 9600 hvc0' >> /etc/inittab

Add your user and add your ssh key and you are ready to use your new system!

August 15, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
new* helpers can read from stdin (August 15, 2018, 09:21 UTC)

Did you know that new* helpers can read from stdin? Well, now you know! So instead of writing to a temporary file you can install your inline text straight to the destination:

src_install() {
  # old code
  cat <<-EOF >"${T}"/mywrapper || die
    #!/bin/sh
    exec do-something --with-some-argument
  EOF
  dobin "${T}"/mywrapper

  # replacement
  newbin - mywrapper <<-EOF
    #!/bin/sh
    exec do-something --with-some-argument
  EOF
}

August 13, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)

The recent efforts on improving the security of different areas of Gentoo have brought some arguments. Some time ago one of the developers has considered whether he would withstand physical violence if an attacker would use it in order to compromise Gentoo. A few days later another developer has suggested that an attacker could pay Gentoo developers to compromise the distribution. Is this a real threat to Gentoo? Are we all doomed?

Before I answer this question, let me make an important presumption. Gentoo is a community-driven open source project. As such, it has certain inherent weaknesses and there is no way around them without changing what Gentoo fundamentally is. Those weaknesses are common to all projects of the same nature.

Gentoo could indeed be compromised if developers are subject to the threat of violence to themselves or their families. As for money, I don’t want to insult anyone and I don’t think it really matters. The fact is, Gentoo is vulnerable to any adversary resourceful enough, and there are certainly both easier and cheaper ways than the two mentioned. For example, the adversary could get a new developer recruited, or simply trick one of the existing developers into compromising the distribution. It just takes one developer out of ~150.

As I said, there is no way around that without making major changes to the organizational structure of Gentoo. Those changes would probably do more harm to Gentoo than good. We can just admit that we can’t fully protect Gentoo from focused attack of a resourceful adversary, and all we can do is to limit the potential damage, detect it quickly and counteract the best we can. However, in reality random probes and script kiddie attacks that focus on trivial technical vulnerabilities are more likely, and that’s what the security efforts end up focusing on.

There seems to be some recurring confusion among Gentoo developers regarding the topic of OpenPGP key expiration dates. Some developers seem to believe them to be some kind of security measure — and start arguing about its weaknesses. Furthermore, some people seem to think of it as rotation mechanism, and believe that they are expected to generate new keys. The truth is, expiration date is neither of those.

The key expiration date can be updated at any time (both lengthened or shortened), including past the previous expiration date. This is a feature, not a bug. In fact, you are expected to update your expiration dates periodically. You certainly should not rotate your primary key unless really necessary, as switching to a new key usually involves a lot of hassle.

If an attacker manages to compromise your primary key, he can easily update the expiration date as well (even if it expires first). Therefore, expiration date does not really provide any added protection here. Revocation is the only way of dealing with compromised keys.

Expiration dates really serve two purposes: naturally eliminating unused keys, and enforcing periodical checks on the primary key. By requiring the developers to periodically update their expiration dates, we also implicitly force them to check whether their primary secret key (which we recommend storing offline, in a secure place) is still present and working. Now, if it turns out that the developer can’t neither update the expiration date nor revoke the key (because the key, its backups and the revocation certificate are all lost, damaged or the developer goes MIA), the key will eventually expire and stop being a ‘ghost’.

Even then, developers argue that we have LDAP and retirement procedures to deal with that. However, OpenPGP keys go beyond Gentoo and beyond Gentoo Infrastructure. We want to encourage good practices that will also affect our users and other people with whom developers are communicating, and who have no reason to know about internal Gentoo key management.

August 12, 2018

Pwnies logo

Congratulations to security researcher and Gentoo developer Hanno Böck and his co-authors Juraj Somorovsky and Craig Young for winning one of this year’s coveted Pwnie awards!

The award is for their work on the Return Of Bleichenbacher’s Oracle Threat or ROBOT vulnerability, which at the time of discovery affected such illustrious sites as Facebook and Paypal. Technical details can be found in the full paper published at the Cryptology ePrint Archive.

FroSCon logo

As last year, there will be a Gentoo booth again at the upcoming FrOSCon “Free and Open Source Conference” in St. Augustin near Bonn! Visitors can meet Gentoo developers to ask any question, get Gentoo swag, and prepare, configure, and compile their own Gentoo buttons.

The conference is 25th and 26th of August 2018, and there is no entry fee. See you there!

August 04, 2018
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
ptrace() and accidental boot fix on ia64 (August 04, 2018, 00:00 UTC)

trofi's blog: ptrace() and accidental boot fix on ia64

ptrace() and accidental boot fix on ia64

This story is another dive into linux kernel internals. It has started as a strace hangup on ia64 and ended up being an unusual case of gcc generating garbage code for linux kernel (not perfectly valid C either). I’ll try to cover a few ptrace() system call corners on x86_64 and ia64 for comparison.

Intro

I updated **elilo** and kernel on ia64 machine recently.

Kernel boot times shrunk from 10 minutes (kernel 3.14.14) down to 2 minutes (kernel 4.9.72). 3.14.14 kernel had large 8-minute pause when early console was not accessible. Every time this pause happened I thought I bricked the machine. And now delays are gone \o/

One new thing broke (so far): every time I ran strace it was hanging without any output printed. Mike Frysinger pointed out strace hangup likely related to gdb problems on ia64 reported before by Émeric Maschino.

And he was right!

Reproducing

Using ski image I booted fresh kernel to make sure the bug was still there:

# strace ls
<no response, hangup>

Yay! ski was able reproduce it: no need to torture physical machine while debugging. Next step was to find where strace got stuck. As strace and gdb are broken I had to resort to printf() debugging.

Before doing that I tried strace’s -d option to enable debug mode where it prints everything it expects from tracee process:

root@ia64 / # strace -d ls
strace: ptrace_setoptions = 0x51
strace: new tcb for pid 52, active tcbs:1
strace: [wait(0x80137f) = 52] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
strace: pid 52 has TCB_STARTUP, initializing it
strace: [wait(0x80057f) = 52] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
strace: [wait(0x00127f) = 52] WIFSTOPPED,sig=SIGCONT
strace: [wait(0x00857f) = 52] WIFSTOPPED,sig=133
????

Cryptic output. I tried to compare this output against correctly working x86_64 system to understand what went wrong:

amd64 $ strace -d ls
strace: ptrace_setoptions = 0x51
strace: new tcb for pid 29343, active tcbs:1
strace: [wait(0x80137f) = 29343] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
strace: pid 29343 has TCB_STARTUP, initializing it
strace: [wait(0x80057f) = 29343] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
strace: [wait(0x00127f) = 29343] WIFSTOPPED,sig=SIGCONT
strace: [wait(0x00857f) = 29343] WIFSTOPPED,sig=133
execve("/bin/ls", ["ls"], 0x60000fffffa4f1f8 /* 36 vars */strace: [wait(0x04057f) = 29343] WIFSTOPPED,sig=SIGTRAP,EVENT_EXEC (4)
strace: [wait(0x00857f) = 29343] WIFSTOPPED,sig=133
...

Up to execve call both logs are identical. Still no clue.

I spent some time looking at ptrace state machine in kernel and gave up trying to understand what was wrong. I then asked strace maintainer on what could be wrong and got an almost immediate response from Dmitry V. Levin: strace did not show actual error.

After a source code tweak he pointed at ptrace() syscall failure returning -EIO:

$ ./strace -d /
./strace: ptrace_setoptions = 0x51
./strace: new tcb for pid 11080, active tcbs:1
./strace: [wait(0x80137f) = 11080] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
./strace: pid 11080 has TCB_STARTUP, initializing it
./strace: [wait(0x80057f) = 11080] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
./strace: [wait(0x00127f) = 11080] WIFSTOPPED,sig=SIGCONT
./strace: [wait(0x00857f) = 11080] WIFSTOPPED,sig=133
./strace: get_regs: get_regs_error: Input/output error
????
...
"Looks like ptrace(PTRACE_GETREGS) always fails with EIO on this new kernel."

Now I got a more specific signal: ptrace(PTRACE_GETREGS,…) syscall failed.

Into the kernel

I felt I had finally found the smoking gun: getting registers of WIFSTOPPED tracee task should never fail. All registers must be already stored somewhere in memory.

Otherwise how would kernel be able to resume executing tracee task when needed?

Before diving into ia64 land let’s look into x86_64 ptrace(PTRACE_GETREGS, …) implementation.

x86_64 ptrace(PTRACE_GETREGS)

To find a <foo> syscall implementation in kernel we can search for sys_<foo>() function definition. The lazy way to find a definition is to interrogate built kernel with gdb:

$ gdb --quiet ./vmlinux
(gdb) list sys_ptrace
1105
1106    #ifndef arch_ptrace_attach
1107    #define arch_ptrace_attach(child)       do { } while (0)
1108    #endif
1109
1110    SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
1111                    unsigned long, data)
1112    {
1113            struct task_struct *child;
1114            long ret;

SYSCALL_DEFINE4(ptrace, …) macro defines actual sys_ptrace() which does a few sanity checks and dispatches to arch_ptrace():

x86_64 implementation does copy_regset_to_user() call and takes a few lines of code to fetch registers:

Let’s look at it in detail to get the idea where registers are normally stored.

Here copy_regset_to_user() is just a dispatcher to view argument. Moving on:

A bit of boilerplate to tie genregs_get() and genregs_set() to 64-bit (or 32-bit) caller. Let’s look at 64-bit variant of genregs_get() as it’s used in our PTRACE_GETREGS case:

From task_pt_regs() defnition we see that actual register contents is stored in task’s kernel stack. And genregs_get() copies register contents one by one in a while() loop.

How do task’s registers get stored to task’s kernel stack? There are a few paths to get there. Most frequent is perhaps interrupt handling when task is descheduled from CPU and is moved to scheduler wait queue.

ENTRY(interrupt_entry): is an entry point for interrupt handling.

ENTRY(interrupt_entry)
    UNWIND_HINT_FUNC
    ASM_CLAC
    cld

    testb        $3, CS-ORIG_RAX+8(%rsp)
    jz        1f
    SWAPGS

    /*
     * Switch to the thread stack. The IRET frame and orig_ax are
     * on the stack, as well as the return address. RDI..R12 are
     * not (yet) on the stack and space has not (yet) been
     * allocated for them.
     */
    pushq        %rdi

    /* Need to switch before accessing the thread stack. */
    SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
    movq        %rsp, %rdi
    movq        PER_CPU_VAR(cpu_current_top_of_stack), %rsp

     /*
      * We have RDI, return address, and orig_ax on the stack on
      * top of the IRET frame. That means offset=24
      */
    UNWIND_HINT_IRET_REGS base=%rdi offset=24

    pushq        7*8(%rdi)                /* regs->ss */
    pushq        6*8(%rdi)                /* regs->rsp */
    pushq        5*8(%rdi)                /* regs->eflags */
    pushq        4*8(%rdi)                /* regs->cs */
    pushq        3*8(%rdi)                /* regs->ip */
    pushq        2*8(%rdi)                /* regs->orig_ax */
    pushq        8(%rdi)                        /* return address */
    UNWIND_HINT_FUNC

    movq        (%rdi), %rdi
1:

    PUSH_AND_CLEAR_REGS save_ret=1
    ENCODE_FRAME_POINTER 8

    testb        $3, CS+8(%rsp)
    jz        1f

    /*
     * IRQ from user mode.
     *
     * We need to tell lockdep that IRQs are off.  We can't do this until
     * we fix gsbase, and we should do it before enter_from_user_mode
     * (which can take locks).  Since TRACE_IRQS_OFF is idempotent,
     * the simplest way to handle it is to just call it twice if
     * we enter from user mode.  There's no reason to optimize this since
     * TRACE_IRQS_OFF is a no-op if lockdep is off.
     */
    TRACE_IRQS_OFF

    CALL_enter_from_user_mode

1:
    ENTER_IRQ_STACK old_rsp=%rdi save_ret=1
    /* We entered an interrupt context - irqs are off: */
    TRACE_IRQS_OFF

    ret
END(interrupt_entry)
; ...
.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
    /*
     * Push registers and sanitize registers of values that a
     * speculation attack might otherwise want to exploit. The
     * lower registers are likely clobbered well before they
     * could be put to use in a speculative execution gadget.
     * Interleave XOR with PUSH for better uop scheduling:
     */
    .if \save_ret
    pushq        %rsi                /* pt_regs->si */
    movq        8(%rsp), %rsi        /* temporarily store the return address in %rsi */
    movq        %rdi, 8(%rsp)        /* pt_regs->di (overwriting original return address) */
    .else
    pushq   %rdi                /* pt_regs->di */
    pushq   %rsi                /* pt_regs->si */
    .endif
    pushq        \rdx                /* pt_regs->dx */
    xorl        %edx, %edx        /* nospec   dx */
    pushq   %rcx                /* pt_regs->cx */
    xorl        %ecx, %ecx        /* nospec   cx */
    pushq   \rax                /* pt_regs->ax */
    pushq   %r8                /* pt_regs->r8 */
    xorl        %r8d, %r8d        /* nospec   r8 */
    pushq   %r9                /* pt_regs->r9 */
    xorl        %r9d, %r9d        /* nospec   r9 */
    pushq   %r10                /* pt_regs->r10 */
    xorl        %r10d, %r10d        /* nospec   r10 */
    pushq   %r11                /* pt_regs->r11 */
    xorl        %r11d, %r11d        /* nospec   r11*/
    pushq        %rbx                /* pt_regs->rbx */
    xorl    %ebx, %ebx        /* nospec   rbx*/
    pushq        %rbp                /* pt_regs->rbp */
    xorl    %ebp, %ebp        /* nospec   rbp*/
    pushq        %r12                /* pt_regs->r12 */
    xorl        %r12d, %r12d        /* nospec   r12*/
    pushq        %r13                /* pt_regs->r13 */
    xorl        %r13d, %r13d        /* nospec   r13*/
    pushq        %r14                /* pt_regs->r14 */
    xorl        %r14d, %r14d        /* nospec   r14*/
    pushq        %r15                /* pt_regs->r15 */
    xorl        %r15d, %r15d        /* nospec   r15*/
    UNWIND_HINT_REGS
    .if \save_ret
    pushq        %rsi                /* return address on top of stack */
    .endif
.endm

Interesting effects of the interrupt_entry are:

  • registers are backed up by PUSH_AND_CLEAR_REGS macro
  • memory area used for backup is PER_CPU_VAR(cpu_current_top_of_stack) (task’s kernel stack)

To recap: ptrace(PTRACE_GETREGS, …) does elementwise copy (using __put_user()) for each general register located in a single struct pt_regs in task’s kernel stack to tracer’s userspace.

Now let’s look at how ia64 does the same.

ia64 ptrace(PTRACE_GETREGS)

“Can’t be much more complicated than on x86_64” was my thought. Haha.

I started searching for -EIO failure in kernel and sprinkling printk() statements in ptrace() handling code.

ia64 begins with the same call path as x86_64:

Again, ptrace_getregs() is supposed to copy in-memory context back to caller’s userspace. Where did it return EIO?

Quiz: while you are skimming through the ptrace_getregs() code and comments right below, try to guess which EIO exit path is taken in our case. I’ve marked the cases with [N] numbers.

static long
ptrace_getregs (struct task_struct *child, struct pt_all_user_regs __user *ppr)
{
    // ...
    // [1] check if we can write back to userspace
    if (!access_ok(VERIFY_WRITE, ppr, sizeof(struct pt_all_user_regs)))
            return -EIO;

    // [2] get pointer to register context (ok)
    pt = task_pt_regs(child);
    // [3] and tracee kernel stack (unexpected!)
    sw = (struct switch_stack *) (child->thread.ksp + 16);

    // [4] Try to unwind tracee's call chain (even more unexpected!)
    unw_init_from_blocked_task(&info, child);
    if (unw_unwind_to_user(&info) < 0) {
            return -EIO;
    }

    // [5] validate alignment of target userspace buffer
    if (((unsigned long) ppr & 0x7) != 0) {
            dprintk("ptrace:unaligned register address %p\n", ppr);
            return -EIO;
    }

    // [6] fetch special registers into local variables
    if (access_uarea(child, PT_CR_IPSR, &psr, 0) < 0
        || access_uarea(child, PT_AR_EC, &ec, 0) < 0
        || access_uarea(child, PT_AR_LC, &lc, 0) < 0
        || access_uarea(child, PT_AR_RNAT, &rnat, 0) < 0
        || access_uarea(child, PT_AR_BSP, &bsp, 0) < 0
        || access_uarea(child, PT_CFM, &cfm, 0)
        || access_uarea(child, PT_NAT_BITS, &nat_bits, 0))
            return -EIO;

    /* control regs */

    // [7] Finally start populating reguster contents into userspace:
    retval |= __put_user(pt->cr_iip, &ppr->cr_iip);
    retval |= __put_user(psr, &ppr->cr_ipsr);

    /* app regs */
    // [8] a few application registers
    retval |= __put_user(pt->ar_pfs, &ppr->ar[PT_AUR_PFS]);
    retval |= __put_user(pt->ar_rsc, &ppr->ar[PT_AUR_RSC]);
    retval |= __put_user(pt->ar_bspstore, &ppr->ar[PT_AUR_BSPSTORE]);
    retval |= __put_user(pt->ar_unat, &ppr->ar[PT_AUR_UNAT]);
    retval |= __put_user(pt->ar_ccv, &ppr->ar[PT_AUR_CCV]);
    retval |= __put_user(pt->ar_fpsr, &ppr->ar[PT_AUR_FPSR]);

    retval |= __put_user(ec, &ppr->ar[PT_AUR_EC]);
    retval |= __put_user(lc, &ppr->ar[PT_AUR_LC]);
    retval |= __put_user(rnat, &ppr->ar[PT_AUR_RNAT]);
    retval |= __put_user(bsp, &ppr->ar[PT_AUR_BSP]);
    retval |= __put_user(cfm, &ppr->cfm);

    /* gr1-gr3 */
    // [9] normal (general) registers
    retval |= __copy_to_user(&ppr->gr[1], &pt->r1, sizeof(long));
    retval |= __copy_to_user(&ppr->gr[2], &pt->r2, sizeof(long) *2);

    /* gr4-gr7 */
    // [10] more normal (general) registers!
    for (i = 4; i < 8; i++) {
            if (unw_access_gr(&info, i, &val, &nat, 0) < 0)
                    return -EIO;
            retval |= __put_user(val, &ppr->gr[i]);
    }

    /* gr8-gr11 */
    // [11] even more normal (general) registers!!
    retval |= __copy_to_user(&ppr->gr[8], &pt->r8, sizeof(long) * 4);

    /* gr12-gr15 */
    // [11] you've got the idea
    retval |= __copy_to_user(&ppr->gr[12], &pt->r12, sizeof(long) * 2);
    retval |= __copy_to_user(&ppr->gr[14], &pt->r14, sizeof(long));
    retval |= __copy_to_user(&ppr->gr[15], &pt->r15, sizeof(long));

    /* gr16-gr31 */
    // [12] even more of those
    retval |= __copy_to_user(&ppr->gr[16], &pt->r16, sizeof(long) * 16);

    /* b0 */
    // [13] branch register b0
    retval |= __put_user(pt->b0, &ppr->br[0]);

    /* b1-b5 */
    // [13] more branch registers
    for (i = 1; i < 6; i++) {
            if (unw_access_br(&info, i, &val, 0) < 0)
                    return -EIO;
            __put_user(val, &ppr->br[i]);
    }

    /* b6-b7 */
    // [14] even more branch registers
    retval |= __put_user(pt->b6, &ppr->br[6]);
    retval |= __put_user(pt->b7, &ppr->br[7]);

    /* fr2-fr5 */
    // [15] floating point registers
    for (i = 2; i < 6; i++) {
            if (unw_get_fr(&info, i, &fpval) < 0)
                    return -EIO;
            retval |= __copy_to_user(&ppr->fr[i], &fpval, sizeof (fpval));
    }

    /* fr6-fr11 */
    // [16] more floating point registers
    retval |= __copy_to_user(&ppr->fr[6], &pt->f6,
                             sizeof(struct ia64_fpreg) * 6);

    /* fp scratch regs(12-15) */
    // [17] more floating point registers
    retval |= __copy_to_user(&ppr->fr[12], &sw->f12,
                             sizeof(struct ia64_fpreg) * 4);

    /* fr16-fr31 */
    // [18] even more floating point registers
    for (i = 16; i < 32; i++) {
            if (unw_get_fr(&info, i, &fpval) < 0)
                    return -EIO;
            retval |= __copy_to_user(&ppr->fr[i], &fpval, sizeof (fpval));
    }

    /* fph */
    // [19] rest of floating point registers
    ia64_flush_fph(child);
    retval |= __copy_to_user(&ppr->fr[32], &child->thread.fph,
                             sizeof(ppr->fr[32]) * 96);

    /*  preds */
    // [20] predicate registers
    retval |= __put_user(pt->pr, &ppr->pr);

    /* nat bits */
    // [20] NaT status registers
    retval |= __put_user(nat_bits, &ppr->nat);

    ret = retval ? -EIO : 0;
    return ret;
}

It’s a huge function. Be afraid not! It has two main parts:

  • extraction of register values using unw_unwind_to_user()
  • copying extracted values to caller’s userspace using __put_user() and __copy_to_user() helpers.

Those two are a analogous of x86_64’s copy_regset_to_user() implementation.

Quiz answer: surprisingly it’s case [4]: EIO popped up due to a failure in unw_unwind_to_user() call. Or not so surprisingly given it’s The Function to fetch register values from somewhere.

Let’s check where register contents are hiding on ia64. Here goes unw_unwind_to_user() definition:

The code above is more complicated than on x86_64. How is it supposed to work?

For efficiency reasons syscall interface (and even interrupt handling interface) on ia64 looks a lot more like normal function call. This means that linux does not store all general registers to a separate struct pt_regs backup area for each task switch.

Let’s peek at interrupt handling entry for completeness.

ia64 uses interrupt entrypoint to enter the kernel at ENTRY(interrupt):

The code above handles interrupts as:

  • SAVE_MIN_WITH_COVER sets kernel stack (r12), gp (r1) and so on
  • SAVE_REST stores rest of registers r2 to r31 but leaves r32 to r127 be managed by RSE (register stack engine) like normal function call would.
  • Hands off control to C code in ia64_handle_irq.

All the above means that in order to get register r32 or similar we would need to perform stack kernel unwinding down to the userspace boundary and read register values from RSE memory area (backing store).

Into the rabbit hole

Back to our unwinder failure.

Our case is not very complicated as tracee is stopped at system call boundary and there is not too much to unwind. How one would know where user boundary starts? linux looks at return instruction pointer in every stack frame and checks if it’s return address still points to kernel address space.

Unwinding failure seemingly happens in depths of unw_unwind(info, &ip). From there find_save_locs(info); is called. find_save_locs() lazily builds or runs an unwind script. The run_script() is a small bytecode interpter of 11 instruction types.

If the above does not make sense to you it’s fine. It did not make sense to me either.

To get more information from unwinder I enabled debugging output for unwinder by adding #define UNW_DEBUG:

I ran strace again:

ia64 # strace -v -d ls
strace: ptrace_setoptions = 0x51
unwind.build_script: no unwind info for ip=0xa00000010001c1a0 (prev ip=0x0)
unwind.run_script: no state->pt, dst=18, val=136
unwind.unw_unwind: failed to locate return link (ip=0xa00000010001c1a0)!
unwind.unw_unwind_to_user: failed to unwind to user-level (ip=0xa00000010001c1a0)

build_script() couldn’t resolve current ip=0xa00000010001c1a0 address. Why? No idea! I added printk() around the place where I expected a match:

I ran strace again:

ia64 # strace -v -d ls
strace: ptrace_setoptions = 0x51
unwind.build_script: looking up ip=0xa00000010001c1a0 in [start=0xa000000100009240,end=0xa000000100000000)
unwind.build_script: looking up ip=0xa00000010001c1a0 in [start=0xa000000000040720,end=0xa000000000040ad0)
unwind.build_script: no unwind info for ip=0xa00000010001c1a0 (prev ip=0x0)

Can you spot the problem? Look at this range: [start=0xa000000100009240,end=0xa000000100000000). It’s end is less than start. This renders table->start && ip < table->end condition to be always false. How could it happen?

It means the ptrace() itself is not at fault here but a victim of already corrupted table->end value.

Going deeper

To find table->end corruption I checked if table was populated correctly. It is done by a simple function init_unwind_table():

Table construction happens in only a few places:

Here we see unwind tables created for:

  • one table for kernel itself
  • one table linux-gate.so (equivalent of linux-vdso.so.1 on x86_64)
  • one table for each kernel module

Arrays are hard

Nothing complicated, right? Actually gcc fails to generate correct code for end[-1].end_offset expression! It happens to be a rare corner case:

Both __start_unwind and __end_unwind are defined in linker script as external symbols:

# somewhere in arch/ia64/kernel/vmlinux.lds.S
# ...
SECTIONS {
    # ...
    .IA_64.unwind : AT(ADDR(.IA_64.unwind) - LOAD_OFFSET) {
            __start_unwind = .;
            *(.IA_64.unwind*)
            __end_unwind = .;
    } :code :unwind
    # ...

Here is how C code defines __end_unwind:

If we manually inline all the above into unw_init we will get the following:

If __end_unwind[] would be an array defined in C then negative index -1 would cause undefned behaviour.

On the practical side it’s just pointer arithmetics. Is there anything special about subtracting a few bytes from an arbitrary address and then dereference it?

Let’s check what kind of assembly gcc actually generates.

Compiler mysteries

Still reading? Great! You got to most exciting part of this article!

Let’s look at simpler code first. And then we will grow it to be closer to our initial example.

Let’s start from global array with a negative index:

Compilation result (I’ll strip irrelevant bits and annotations):

Here two things happen:

  • __some_table address is read from GOT (r1 is rougly GOT register) by performing an ld8.mov (a form of 8-byte load) into r14.
  • final value is loaded at address r14 - 18 using ld8 (also a 8-byte load).

Simple!

We can simplify the example by avoiding GOT indirection. The typical way to do it is to use __attribute__((visibility(“hidden”))) hint:

Assembly code:

Here movl r14 = @gprel(__some_table#) is a link-time 64-bit constant: an offset of __some_table array from r1 value. Only a single 8-byte load happens at address @gprel(__some_table#) + r1 - 8.

Also straightforward.

Now let’s change the alignment of our table from long (8 bytes on ia64) to char (1 byte):

This is quite a blowup in code size! Here instead of one 8-byte ld8 load compiler generated 8 1-byte ld1 loads to assemble valid value with the help of arithmetic shifts and ors.

Note how each individual byte gets it’s personal register to keep an address and result of the load.

Here is the subset of above instructions to handle byte offset -5:

This code, while ugly and inefficient, is still correct.

Now let’s wrap our 8-byte value in a struct to make example closer to original unwinder’s table registration code:

Quiz time: do you think generated code will be exactly the same as in previous example or somehow different?

The code is different from previous one! Seemingly not too much but there one suspicious detail: offsets now are very large. Let’s look at our -5 example again:

The offset 0x1ffffffffffffffb (2305843009213693947) used here is incorrect. It should have been 0xfffffffffffffffb (-5).

We encounter (arguably) a compiler bug known as PR84184. Upstream says struct handling is different enough from direct array dereferences to trick gcc into generating incorrect byte offsets.

One day I’ll take a closer look at it to understand mechanics.

Let’s explore one more example: what if we add bigger alignment to __some_table without changing it’s type?

Exactly as our original clean and fast example: single aligned load at offset -8.

Now we have a simple workaround!

What if we pass our array in a register instead of using a global reference? (effectively uninlining array address)

Also works! Note how compiler promotes alignment after a type cast from 1 to 8.

In this case a few things happen at the same time to trigger bad code generation:

  • gcc infers that char __end_unwind[] is an array literal with alignment 1
  • gcc inlines __end_unwind into init_unwind_table and demotes alignment from 8 (const struct unw_table_entry) to 1 (extern char [])
  • gcc assumes that __end_unwind can’t have negative subscript and generates invalid (and inefficient) code

Workarounds (aka hacks) time!

We can workaround corner-case conditions above in a few different ways:

Fix is still not perfect as negative subscript it used. But at least the load is aligned.

Note that void __init unw_init() is called early in kernel startup sequence even before console is initialized.

This code generation bug causes either garbage read from some memory location or kernel crash trying to access unmapped memory.

That is the strace breakage mechanics.

Parting words

  • Task switch on x86_64 and on ia64 is fun :)
  • On x86_64 implementation of ptrace(PTRACE_GETREGS, …) is very straightforward: almost a memcpy from predefined location.
  • On ia64 ptrace(PTRACE_GETREGS, …) requires many moving parts:
    • call stack unwinder for kernel (involving linker scripts to define __end_unwind and __start_unwind)
    • bytecode generator and bytecode interpreter to speedup unwinding for every ptrace() call
  • Unaligned load of register-sized value is a tricky and fragile business

Have fun!

Posted on August 4, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

July 24, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

So I have a Django DurationField in my model, and needed to format this as HH:mm .. unfortunately django doesn't seem to support that out of the box.. after considering templatetags or writing my own filter I decided to go for a very simple alternative and just defined a method for this in my model:

    timeslot_duration = models.DurationField(null=False,
                                             blank=False,
                                             default='00:05:00',
                                             verbose_name=_('timeslot_duration'),
                                             help_text=_('[DD] [HH:[MM:]]ss[.uuuuuu] format')
                                             )

    def timeslot_duration_HHmm(self):
        sec = self.timeslot_duration.total_seconds()
        return '%02d:%02d' % (int((sec/3600)%3600), int((sec/60)%60))

that way I can do whatever I want format-wise to get exactly what I need. Not sure if this is recommended practice, or maybe frowned upon, but it works just fine.

and in my template then just use {{ <model>.timeslot_duration_HHmm }} instead of {{ <model>.timeslot_duration }}.

July 19, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)

This quick article is a wrap up for reference on how to connect to ScyllaDB using Spark 2 when authentication and SSL are enforced for the clients on the Scylla cluster.

We encountered multiple problems, even more since we distribute our workload using a YARN cluster so that our worker nodes should have everything they need to connect properly to Scylla.

We found very little help online so I hope it will serve anyone facing similar issues (that’s also why I copy/pasted them here).

The authentication part is easy going by itself and was not the source of our problems, SSL on the client side was.

Environment

  • (py)spark: 2.1.0.cloudera2
  • spark-cassandra-connector: datastax:spark-cassandra-connector: 2.0.1-s_2.11
  • python: 3.5.5
  • java: 1.8.0_144
  • scylladb: 2.1.5

SSL cipher setup

The Datastax spark cassandra driver uses default the TLS_RSA_WITH_AES_256_CBC_SHA cipher that the JVM does not support by default. This raises the following error when connecting to Scylla:

18/07/18 13:13:41 WARN channel.ChannelInitializer: Failed to initialize a channel. Closing: [id: 0x8d6f78a7]
java.lang.IllegalArgumentException: Cannot support TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers

According to the ssl documentation we have two ciphers available:

  1. TLS_RSA_WITH_AES_256_CBC_SHA
  2. TLS_RSA_WITH_AES_128_CBC_SHA

We can get get rid of the error by lowering the cipher to TLS_RSA_WITH_AES_128_CBC_SHA using the following configuration:

.config("spark.cassandra.connection.ssl.enabledAlgorithms", "TLS_RSA_WITH_AES_128_CBC_SHA")\

However, this is not really a good solution and instead we’d be inclined to use the TLS_RSA_WITH_AES_256_CBC_SHA version. For this we need to follow this Datastax’s procedure.

Then we need to deploy the JCE security jars on our all client nodes, if using YARN like us this means that you have to deploy these jars to all your NodeManager nodes.

For example by hand:

# unzip jce_policy-8.zip
# cp UnlimitedJCEPolicyJDK8/*.jar /opt/oracle-jdk-bin-1.8.0.144/jre/lib/security/

Java trust store

When connecting, the clients need to be able to validate the Scylla cluster’s self-signed CA. This is done by setting up a trustStore JKS file and providing it to the spark connector configuration (note that you protect this file with a password).

keyStore vs trustStore

In SSL handshake purpose of trustStore is to verify credentials and purpose of keyStore is to provide credentials. keyStore in Java stores private key and certificates corresponding to the public keys and is required if you are a SSL Server or SSL requires client authentication. TrustStore stores certificates from third parties or your own self-signed certificates, your application identify and validates them using this trustStore.

The spark-cassandra-connector documentation has two options to handle keyStore and trustStore.

When we did not use the trustStore option, we would get some obscure error when connecting to Scylla:

com.datastax.driver.core.exceptions.TransportException: [node/1.1.1.1:9042] Channel has been closed

When enabling DEBUG logging, we get a clearer error which indicated a failure in validating the SSL certificate provided by the Scylla server node:

Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

setting up the trustStore JKS

You need to have the self-signed CA public certificate file, then issue the following command:

# keytool -importcert -file /usr/local/share/ca-certificates/MY_SELF_SIGNED_CA.crt -keystore COMPANY_TRUSTSTORE.jks -noprompt
Enter keystore password:  
Re-enter new password: 
Certificate was added to keystore

using the trustStore

Now you need to configure spark to use the trustStore like this:

.config("spark.cassandra.connection.ssl.trustStore.password", "PASSWORD")\
.config("spark.cassandra.connection.ssl.trustStore.path", "COMPANY_TRUSTSTORE.jks")\

Spark SSL configuration example

This wraps up the SSL connection configuration used for spark.

This example uses pyspark2 and reads a table in Scylla from a YARN cluster:

$ pyspark2 --packages datastax:spark-cassandra-connector:2.0.1-s_2.11 --files COMPANY_TRUSTSTORE.jks

>>> spark = SparkSession.builder.appName("scylla_app")\
.config("spark.cassandra.auth.password", "test")\
.config("spark.cassandra.auth.username", "test")\
.config("spark.cassandra.connection.host", "node1,node2,node3")\
.config("spark.cassandra.connection.ssl.clientAuth.enabled", True)\
.config("spark.cassandra.connection.ssl.enabled", True)\
.config("spark.cassandra.connection.ssl.trustStore.password", "PASSWORD")\
.config("spark.cassandra.connection.ssl.trustStore.path", "COMPANY_TRUSTSTORE.jks")\
.config("spark.cassandra.input.split.size_in_mb", 1)\
.config("spark.yarn.queue", "scylla_queue").getOrCreate()

>>> df = spark.read.format("org.apache.spark.sql.cassandra").options(table="my_table", keyspace="test").load()
>>> df.show()

July 15, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

When playing with the thought of adding images to my books DB i thought: I need random names, and would like to scale them ..

So I looked a bit and found django-stdimage. I was pretty happy with what it could do, but the uuid4 names themselves seemed a bit .. not what I wanted .. So I came up with adding the objects pk into the filename as well.

There were some nice ways already to generate filenames, but none exactly what I wanted.

Here is my own class UploadToClassNameDirPKUUID:

class UploadToClassNameDirPKUUID(UploadToClassNameDir):
    def __call__(self, instance, filename):
        # slightly modified from the UploadToUUId class from stdimage.utils
        if instance.pk:
            self.kwargs.update({
                'name': '{}-{}'.format(instance.pk, uuid4().hex),
                })
        else:
            # no pk found so just get uuid4.hex
            self.kwargs.update({
                'name': uuid4().hex
                })
        return super().__call__(instance, filename)

Basically the same as UploadToClassNameDirUUID, but with instance.pk added in the front of the filename - this is purely convenience for me so I have the 2 pictures for my book (front&back) identifiable in the directory without looking both up in my DB. One could maybe argue it would "expose" the pk, but first in this case I do not really care as the app is not public and 2nd: anyone who can access my django-admin (which is what I use for data entry,..) would see the pk anyway so whatever ;)

July 14, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

Since I wanted to make an inventory app (that i will likely post the source of - as FOSS of course - at some point when I am done) I wanted to have a model for languages with their ISO 639-1 code.

Now the model itself is of course easy, but where to get the data to popluate it.. I was certainly not going to do that manually. After a bit of searching and talking to people on IRC I dug a bit aorund the django I18N / L10N code and found something I could use: django.conf.locale.LANG_INFO While this is without a doubt used interanally for django, I thought that would be awesome to just use that as  base for my data.

The next point was how to get the data into my DB without too much effort, but reproduceable. The first thing that came to mind was to write my own migration and populate it from there. Not somethign I particularily liked since I have a tendency to wipe my migrations and start from scratch during development and I was sure i'd delete just that one-too-many.

The other - and in my opinion better- option I found was more flexible as to when it was run and also beautifully simple: just write my own Custom Management command to do the data import for me. Using the Django documentation on custom management commands as base I got this working very quickly. Enough rambling .. here's the code:

first the model (since it was in the source data i added name_local cuz it is probably useful sometimes):

class Language(models.Model):
    '''
    List of languages by iso code (2 letter only because country code
    is not needed.
    This should be popluated by getting data from django.conf.locale.LANG_INFO
    '''
    name = models.CharField(max_length=256,
                            null=False,
                            blank=False,
                            verbose_name=_('Language name')
                            )
    name_local = models.CharField(max_length=256,
                                  null=False,
                                  blank=True,
                                  default='',
                                  verbose_name=_('Language name (in that language)'))
    isocode = models.CharField(max_length=2,
                               null=False,
                               blank=False,
                               unique=True,
                               verbose_name=_('ISO 639-1 Language code'),
                               help_text=_('2 character language code without country')
                               )
    sorting = models.PositiveIntegerField(blank=False,
                                          null=False,
                                          default=0,
                                          verbose_name=_('sorting order'),
                                          help_text=_('increase to show at top of the list')
                                          )

    def __str__(self):
        return '%s (%s)' % (self.name, self.name_local)

    class Meta:
        verbose_name = _('language')
        verbose_name_plural = _('languages')
        ordering = ('-sorting', 'name', 'isocode', )

(of course with gettext support, but if you don't need that just remove the _(...) ;)

Edit 2018-07-15: for usabilty reasons I added a sorting field so that commonly used langauges can be shown at the top of the list.

then create the folder <project>/management/commands directory and in that a file importlanguages.py

from django.core.management.base import BaseCommand, CommandError
from dcollect.models import Language
from django.conf.locale import LANG_INFO


class Command(BaseCommand):
    help = 'Imports language codes and names from django.conf.locale.LANG_INFO'

    def add_arguments(self, parser):
        pass

    def handle(self, *args, **options):
        cnt = 0
        for lang in LANG_INFO:
            if len(lang) == 2:
                #we only care about the 2 letter iso codes
                #self.stdout.write(lang + ' ' + LANG_INFO[lang]['name'] + ' ' + LANG_INFO[lang]['name_local'])
                try:
                    l = Language(isocode=lang,
                                 name=LANG_INFO[lang]['name'],
                                 name_local=LANG_INFO[lang]['name_local'])
                    l.save()
                    cnt += 1
                except Exception as e:
                    self.stdout.write('Error adding language %s' % lang)
        self.stdout.write('Added %d languages to dcollect' % cnt)

That was way easier than expected .. I initially was going to just populate 2 or 3 languages manually and leave the rest for later but that was so simple, that I just got it out of the way.

All that needs to be done now to import languages is python manage.py importlanguages - and the real nice part: no new dependencies added ;)

Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
tracking down mysterious memory corruption (July 14, 2018, 00:00 UTC)

trofi's blog: tracking down mysterious memory corruption

tracking down mysterious memory corruption

I’ve bought my current desktop machine around 2011 (7 years ago) and mostly had no problems with it save one exception: occasionally (once 2-3 months) firefox, liferea or gcc would mysteriously crash.

Bad PTE

dmesg reports would claim that page table entries refer to already freed physical memory:

Apr 24 03:59:17 sf kernel: BUG: Bad page map in process cc1  pte:200000000 pmd:2f9d0d067
Apr 24 03:59:17 sf kernel: addr:00000000711a7136 vm_flags:00000875 anon_vma:          (null) mapping:000000003882992c index:101a
Apr 24 03:59:17 sf kernel: file:cc1 fault:filemap_fault mmap:btrfs_file_mmap readpage:btrfs_readpage
Apr 24 03:59:18 sf kernel: CPU: 1 PID: 14834 Comm: cc1 Tainted: G         C        4.17.0-rc1-00215-g5e7c7806111a #65
Apr 24 03:59:18 sf kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F4 02/16/2012
Apr 24 03:59:18 sf kernel: Call Trace:
Apr 24 03:59:18 sf kernel:  dump_stack+0x46/0x5b
Apr 24 03:59:18 sf kernel:  print_bad_pte+0x193/0x230
Apr 24 03:59:18 sf kernel:  ? page_remove_rmap+0x216/0x330
Apr 24 03:59:18 sf kernel:  unmap_page_range+0x3f7/0x920
Apr 24 03:59:18 sf kernel:  unmap_vmas+0x47/0xa0
Apr 24 03:59:18 sf kernel:  exit_mmap+0x86/0x170
Apr 24 03:59:18 sf kernel:  mmput+0x64/0x120
Apr 24 03:59:18 sf kernel:  do_exit+0x2a9/0xb90
Apr 24 03:59:18 sf kernel:  ? syscall_trace_enter+0x16d/0x2c0
Apr 24 03:59:18 sf kernel:  do_group_exit+0x2e/0xa0
Apr 24 03:59:18 sf kernel:  __x64_sys_exit_group+0xf/0x10
Apr 24 03:59:18 sf kernel:  do_syscall_64+0x4a/0xe0
Apr 24 03:59:18 sf kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 24 03:59:18 sf kernel: RIP: 0033:0x7f7a039dcb96
Apr 24 03:59:18 sf kernel: RSP: 002b:00007fffdfa09d08 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Apr 24 03:59:18 sf kernel: RAX: ffffffffffffffda RBX: 00007f7a03ccc740 RCX: 00007f7a039dcb96
Apr 24 03:59:18 sf kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
Apr 24 03:59:18 sf kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffe70
Apr 24 03:59:18 sf kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 00007f7a03ccc740
Apr 24 03:59:18 sf kernel: R13: 0000000000000038 R14: 00007f7a03cd5608 R15: 0000000000000000
Apr 24 03:59:18 sf kernel: Disabling lock debugging due to kernel taint
Apr 24 03:59:18 sf kernel: BUG: Bad rss-counter state mm:000000004fac8a77 idx:2 val:-1

It’s not something that is easy to debug or reproduce.

**Transparent Hugepages** were a new thing at that time and I was using it systemwide via CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y kernel option.

After those crashes I decided to switch it back to CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y only. Crashes became more rare: once in a 5-6 months.

Enabling more debugging facilities in the kernel did not change anything and I moved on.

A few years later I set up nightly builds on this machine to build and test packages in an automatic way. Things were running smoothly except for a few memory-hungry tests that crashed once in a while: firefox, rust and webkit builds every other night hit internal compiler errors in gcc.

Crashes were very hard to isolate or reproduce: every time SIGSEGV happened on a new source file being compiled. I tried to run the same failed gcc command in a loop for hours to try to reproduce the crash but never succeeded. It is usually a strong sign of flaky hardware. At that point I tried memtest86+-5.01 and memtester tools to validate RAM chips. Tools claimed RAM to be fine. My conclusion was that crashes are the result of an obscure software problem causing memory corruption (probably in the kernel). I had no idea how to debug that and kept on using this system. For day-to-day use it was perfectly stable.

A new clue

[years later]

Last year I joined Gentoo’s toolchain@ project and started caring a bit more about glibc and gcc. dilfridge@ did a fantastic job on making glibc testsuite work on amd64 (and also many other things not directly related to this post).

One day I made a major change in how CFLAGS are handled in glibc ebuild and broke a few users with CFLAGS=-mno-sse4.2. That day I ran glibc testsuite to check if I made things worse. There was only one test failing: string/test-memmove.

Of all the obscure things that glibc checks for only one simple memmove() test refused to work!

The failure occured only on 32-bit version of glibc and looked like this:

$ elf/ld.so --inhibit-cache --library-path . string/test-memmove
simple_memmove  __memmove_ssse3_rep     __memmove_ssse3 __memmove_sse2_unaligned        __memmove_ia32
string/test-memmove: Wrong result in function __memmove_sse2_unaligned dst "0x70000084" src "0x70000000" offset "43297733"

This command runs string/test-memmove binary using ./libc.so.6 and elf/ld.so as a loader.

The good thing is that I was somewhat able to reproduce the failure: every few runs the error popped up. Test was not failing deterministically. Every time test failed it was always __memmove_sse2_unaligned but offset was different.

Here is the test source code. The test basically runs memmove() and checks if all memory was moved as expected. Originally test was written to check how memmove() handles memory ranges that span signed/unsigned address boundary around address 0x80000000. Hence the unusual mmap(addr=0x70000000, size=0x20000000) as a way to allocate memory.

Now the fun thing: the error disappeared as soon as I rebooted the machine. And came back one day later (after the usual nightly tests run). To explore the breakage and make a fix I had to find a faster way to reproduce the failure.

At that point the fastest way to make the test fail again was to run firefox build process first. It took “only” 40 minutes to get the machine in a state when I could reproduce the failure.

Once in that state I started shrinking down **__memmove_sse2_unaligned** implementation to check where exactly data gets transferred incorrectly. 600 lines of straightforward code is not that much.

Note: memcpy()’s behaviour depends on CPU cache size. When the block of copied memory is small (less than CPU cache size, 8MB in my case) memcpy() does not do anything special. Otherwise memcpy() tries to avoid cache pollution and uses non-temporal variant of store instruction: movntdq instead of usual movaps.

While I was poking at this code I found a reliable workaround to make memcpy() never fail on my machine: change movntdq to movdqa:

I was pondering if I should patch binutils locally to avoid movntdq instruction entirely but eventually discarded it and focused on finding the broken component instead. Who knows what else can be there.

I was so close!

A minimal reproducer

I attempted to craft a testcase that does not depend on glibc’s memcpy() and got this:

This code assumes quite a few things from the caller:

  • dest > src as copying happens right-to-left
  • dest has to be 16-byte aligned
  • block size must be a multiple of 16-bytes.

Here is what C code compiles to with -O2 -m32 -msse2:

And with -O2 -m64 -mavx2:

Surprisingly (or not so surprisingly) both -m32/-m64 tests started failing on my machine.

It was always second bit of a 128-bit value that was corrupted.

On 128MB blocks this test usually caused one incorrect bit to be copied once in a few runs. I tried to run exactly the same test on other hardware I have access to. None of it failed.

I started to suspect the kernel to corrupt SSE cpu context on context switch. But why only non-temporal instruction is affected? And why only a single bit and not a full 128-bit chunk? Could it be that the kernel forgot to issue mfence on context switch and all in-flight non-temporal instructions stored garbage? That would be a sad race condition. But the single bit flip did not line up with it.

Sounds more like kernel would arbitrarily flip one bit in userspace. But why only when movntdq is involved?

I suspected CPU bug and upgraded CPU firmware, switched machine from BIOS-compatible mode to native UEFI hoping to fix it. Nope. Nothing changed. Same failure persisted: single bit corruption after a heavy load on the machine.

I started thinking on how to speed my test up to avoid firefox compilation as a trigger.

Back to square one

My suspect was bad RAM again. I modified my test all RAM by allocating 128MB chunks at a time and run memmove() on newly allocated RAM to cover all available pages. Test would either find bad memory or OOM-fail.

And bingo! It took only 30 seconds to reproduce the failure. The test usually started reporting the first problem when it got to 17GB of RAM usage.

I have 4x8GB DDR3-DIMMs. I started brute-forcing various configurations of DIMM order on motherboard slots:

A      B      A      B
DIMM-1 -      -      -      : works
DIMM-2 -      -      -      : works
DIMM-3 -      -      -      : works
DIMM-4 -      -      -      : works
DIMM-1 -      DIMM-3 -      : fails (dual channel mode)
DIMM-1 DIMM-3 -      -      : works (single channel mode)
-      DIMM-2 -      DIMM-4 : works (dual channel mode)
DIMM-3 -      DIMM-1 -      : fails (dual channel mode)
-      DIMM-3 -      DIMM-1 : fails (dual channel mode)
-      DIMM-1 -      DIMM-3 : fails (dual channel mode)
-      DIMM-2 -      DIMM-3 : fails (dual channel mode)

And many other combinations of DIMM-3 with others.

It was obvious DIMM-3 did not like team work. I booted from livecd to double-check it’s not my kernel causing all of this. The error was still there.

I bought and plugged in a new pair of RAM modules in place of DIMM-1 and DIMM-3. And had no mysterious failures since!

Time to flip CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y back on :)

Speculations and open questions

It seems that dual-channel mode and cache coherency has something to do with it. A few thoughs:

  1. Single DDR3-DIMM can perform only 64-bit wide loads and stores.
  2. In dual-channel mode two 64-bit wide stores can happen at a time and require presence of two DIMMs.
  3. movntdq stores directly into RAM possibly evicting existing value from cache. That can cause further writeback to RAM to free dirty cache line.
  4. movdqa stores to cache. But eventually cache pressure will also trigger store back to RAM in chunks of cache line size of Last Line Cache (64-bytes=512-bits for me). Why do we not see corruption happening in this case?

It feels like there should be not much difference between non-temporal and normal instructions in terms of size of data being written at a time over memory bus. What likely changes is access sequence of physical addresses under two workloads. But I don’t know how to look into it in detail.

Mystery!

Parting words

  • This crash took me 7 years to figure out :)
  • Fix didn’t require a single line of code :)
  • Bad RAM happens. Even if memtest86+-5.01 disagrees.
  • As I was running memtest86+ in qemu I found a bunch of unrelated bugs in tianocore implementation of UEFI and memtest86+ gentoo ebuild: hybrid ISO is not recognized as an ISO at all. memtest86+ crashes at statrup for yet unknown reason (likely needs to be fixed against newer toolchain).
  • non-temporal instructions are a thing and have their own memory I/O engine.
  • C-level wrappers around SSE and AVX instructions are easy to use!

Have fun!

Posted on July 14, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

July 09, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

Today's news is that we have submitted a manuscript for publication, describing Lab::Measurement and with it our approach towards fast, flexible, and platform-independent measuring with Perl! The manuscript mainly focuses on the new, Moose-based class hierarchy. We have uploaded it to arXiv as well; here is the (for now) full bibliographic information of the preprint:

 "Lab::Measurement - a portable and extensible framework for controlling lab equipment and conducting measurements"
S. Reinhardt, C. Butschkow, S. Geissler, A. Dirnaichner, F. Olbrich, C. Lane, D. Schröer, and A. K. Hüttel
submitted for publication; arXiv:1804.03321 (PDF, BibTeX entry)
If you're using Lab::Measurement in your lab, and this results in some nice publication, then we'd be very grateful for a citation of our work - for now the preprint, and later hopefully the accepted version.

July 06, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
A botspot story (July 06, 2018, 14:50 UTC)

I felt like sharing a recent story that allowed us identify a bot in a haystack thanks to Scylla.

 

The scenario

While working on loading up 2B+ of rows into Scylla from Hive (using Spark), we noticed a strange behaviour in the performances of one of our nodes:

 

So we started wondering why that server in blue was having those peaks of load and was clearly diverging from the two others… As we obviously expect the three nodes to behave the same, there were two options on the table:

  1. hardware problem on the node
  2. bad data distribution (bad schema design? consistent hash problem?)

We shared this with our pals from ScyllaDB and started working on finding out what was going on.

The investigation

Hardware?

Hardware problem was pretty quickly evicted, nothing showed up on the monitoring and on the kernel logs. I/O queues and throughput were good:

Data distribution?

Avi Kivity (ScyllaDB’s CTO) quickly got the feeling that something was wrong with the data distribution and that we could be facing a hotspot situation. He quickly nailed it down to shard 44 thanks to the scylla-grafana-monitoring platform.

Data is distributed between shards that are stored on nodes (consistent hash ring). This distribution is done by hashing the primary key of your data which dictates the shard it belongs to (and thus the node(s) where the shard is stored).

If one of your keys is over represented in your original data set, then the shard it belongs to can be overly populated and the related node overloaded. This is called a hotspot situation.

tracing queries

The first step was to trace queries in Scylla to try to get deeper into the hotspot analysis. So we enabled tracing using the following formula to get about 1 trace per second in the system_traces namespace.

tracing probability = 1 / expected requests per second throughput

In our case, we were doing between 90K req/s and 150K req/s so we settled for 100K req/s to be safe and enabled tracing on our nodes like this:

# nodetool settraceprobability 0.00001

Turns out tracing didn’t help very much in our case because the traces do not include the query parameters in Scylla 2.1, it is becoming available in the soon to be released 2.2 version.

NOTE: traces expire on the tables, make sure your TRUNCATE the events and sessions tables while iterating. Else you will have to wait for the next gc_grace_period (10 days by default) before they are actually removed. If you do not do that and generate millions of traces like we did, querying the mentioned tables will likely time out because of the “tombstoned” rows even if there is no trace inside any more.

looking at cfhistograms

Glauber Costa was also helping on the case and got us looking at the cfhistograms of the tables we were pushing data to. That proved to be clearly highlighting a hotspot problem:

histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                             (micros)          (micros)           (bytes)                  
50%             0,00              6,00              0,00               258                 2
75%             0,00              6,00              0,00               535                 5
95%             0,00              8,00              0,00              1916                24
98%             0,00             11,72              0,00              3311                50
99%             0,00             28,46              0,00              5722                72
Min             0,00              2,00              0,00               104                 0
Max             0,00          45359,00              0,00          14530764            182785

What this basically means is that 99% percentile of our partitions are small (5KB) while the biggest is 14MB! That’s a huge difference and clearly shows that we have a hotspot on a partition somewhere.

So now we know for sure that we have an over represented key in our data set, but what key is it and why?

The culprit

So we looked at the cardinality of our data set keys which are SHA256 hashes and found out that indeed we had one with more than 1M occurrences while the second highest one was around 100K!…

Now that we had the main culprit hash, we turned to our data streaming pipeline to figure out what kind of event was generating the data associated to the given SHA256 hash… and surprise! It was a client’s quality assurance bot that was constantly browsing their own website with legitimate behaviour and identity credentials associated to it.

So we modified our pipeline to detect this bot and discard its events so that it stops polluting our databases with fake data. Then we cleaned up the million of events worth of mess and traces we stored about the bot.

The aftermath

Finally, we cleared out the data in Scylla and tried again from scratch. Needless to say that the curves got way better and are exactly what we should expect from a well balanced cluster:

Thanks a lot to the ScyllaDB team for their thorough help and high spirited support!

I’ll quote them conclude this quick blog post:

July 02, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)
Altivec and VSX in Rust (part 1) (July 02, 2018, 11:05 UTC)

I’m involved in implementing the Altivec and VSX support on rust stdsimd.

Supporting all the instructions in this language is a HUGE endeavor since for each instruction at least 2 tests have to be written and making functions type-generic gets you to the point of having few pages of implementation (that luckily desugars to the single right instruction and nothing else).

Since I’m doing this mainly for my multimedia needs I have a short list of instructions I find handy to get some code written immediately and today I’ll talk a bit about some of them.

This post is inspired by what Luc did for neon, but I’m using rust instead.

If other people find it useful, I’ll try to write down the remaining istructions.

Permutations

Most if not all the SIMD ISAs have at least one or multiple instructions to shuffle vector elements within a vector or among two.

It is quite common to use those instructions to implement matrix transposes, but it isn’t its only use.

In my toolbox I put vec_perm and vec_xxpermdi since even if the portable stdsimd provides some shuffle support it is quite unwieldy compared to the Altivec native offering.

vec_perm: Vector Permute

Since it first iteration Altivec had a quite amazing instruction called vec_perm or vperm:

    fn vec_perm(a: i8x16, b: i8x16, c: i8x16) -> i8x16 {
        let mut d;
        for i in 0..16 {
            let idx = c[i] & 0xf;
            d[i] = if (c[i] & 0x10) == 0 {
                a[idx]
            } else {
                b[idx]
            };
        }
        d
    }

It is important to notice that the displacement map c is a vector and not a constant. That gives you quite a bit of flexibility in a number of situations.

This instruction is the building block you can use to implement a large deal of common patterns, including some that are also covered by stand-alone instructions e.g.:
– packing/unpacking across lanes as long you do not have to saturate: vec_pack, vec_unpackh/vec_unpackl
– interleave/merge two vectors: vec_mergel, vec_mergeh
– shift N bytes in a vector from another: vec_sld

It can be important to recall this since you could always take two permutations and make one, vec_perm itself is pretty fast and replacing two or more instructions with a single permute can get you a pretty neat speed boost.

vec_xxpermdi Vector Permute Doubleword Immediate

Among a good deal of improvements VSX introduced a number of instructions that work on 64bit-elements vectors, among those we have a permute instruction and I found myself using it a lot.

    #[rustc_args_required_const(2)]
    fn vec_xxpermdi(a: i64x2, b: i64x2, c: u8) -> i64x2 {
        match c & 0b11 {
            0b00 => i64x2::new(a[0], b[0]);
            0b01 => i64x2::new(a[1], b[0]);
            0b10 => i64x2::new(a[0], b[1]);
            0b11 => i64x2::new(a[1], b[1]);
        }
    }

This instruction is surely less flexible than the previous permute but it does not require an additional load.

When working on video codecs is quite common to deal with blocks of pixels that go from 4×4 up to 64×64, before vec_xxpermdi the common pattern was:

    #[inline(always)]
    fn store8(dst: &mut [u8x16], v: &[u8x16]) {
        let data = dst[i];
        dst[i] = vec_perm(v, data, TAKE_THE_FIRST_8);
    }

That implies to load the mask as often as needed as long as the destination.

Using vec_xxpermdi avoids the mask load and that usually leads to a quite significative speedup when the actual function is tiny.

Mixed Arithmetics

With mixed arithmetics I consider all the instructions that do in a single step multiple vector arithmetics.

The original altivec has the following operations available for the integer types:
vec_madds
vec_mladd
vec_mradds
vec_msum
vec_msums
vec_sum2s
vec_sum4s
vec_sums

And the following two for the float type:
vec_madd
vec_nmsub

All of them are quite useful and they will all find their way in stdsimd pretty soon.

I’m describing today vec_sums, vec_msums and vec_madds.

They are quite representative and the other instructions are similar in spirit:
vec_madds, vec_mladd and vec_mradds all compute a lane-wise product, take either the high-order or the low-order part of it
and add a third vector returning a vector of the same element size.
vec_sums, vec_sum2s and vec_sum4s all combine an in-vector sum operation with a sum with another vector.
vec_msum and vec_msums both compute a sum of products, the intermediates are added together and then added to a wider-element
vector.

If there is enough interest and time I can extend this post to cover all of them, for today we’ll go with this approximation.

vec_sums: Vector Sum Saturated

Usually SIMD instruction work with two (or 3) vectors and execute the same operation for each vector element.
Sometimes you want to just do operations within the single vector and vec_sums is one of the few instructions that let you do that:

    fn vec_sums(a: i32x4, b: i32x4) -> i32x4 {
        let d = i32x4::new(0, 0, 0, 0);

        d[3] = b[3].saturating_add(a[0]).saturating_add(a[1]).saturating_add(a[2]).saturating_add(a[3]);

        d
    }

It returns in the last element of the vector the sum of the vector elements of a and the last element of b.
It is pretty handy when you need to compute an average or similar operations.

It works only with 32bit signed element vectors.

vec_msums: Vector Multiply Sum Saturated

This instruction sums the 32bit element of the third vector with the two products of the respective 16bit
elements of the first two vectors overlapping the element.

It does quite a bit:

    fn vmsumshs(a: i16x8, b: i16x8, c: i32x4) -> i32x4 {
        let d;
        for i in 0..4 {
            let idx = 2 * i;
            let m0 = a[idx] as i32 * b[idx] as i32;
            let m1 = a[idx + 1] as i32 * b[idx + 1] as i32;
            d[i] = c[i].saturating_add(m0).saturating_add(m1);
        }
        d
    }

    fn vmsumuhs(a: u16x8, b: u16x8, c: u32x4) -> u32x4 {
        let d;
        for i in 0..4 {
            let idx = 2 * i;
            let m0 = a[idx] as u32 * b[idx] as u32;
            let m1 = a[idx + 1] as u32 * b[idx + 1] as u32;
            d[i] = c[i].saturating_add(m0).saturating_add(m1);
        }
        d
    }

    ...

    fn vec_msums<T, U>(a: T, b: T, c: U) -> U
    where T: sealed::VectorMultiplySumSaturate<U> {
        a.msums(b, c)
    }

It works only with 16bit elements, signed or unsigned. In order to support that on rust we have to use some creative trait.
It is quite neat if you have to implement some filters.

vec_madds: Vector Multiply Add Saturated

    fn vec_madds(a: i16x8, b: i16x8, c: i16x8) -> i16x8 {
        let d;
        for i in 0..8 {
            let v = (a[i] as i32 * b[i] as i32) >> 15;
            d[i] = (v as i16).saturating_add(c[i]);
        }
        d
    }

Takes the high-order 17bit of the lane-wise product of the first two vectors and adds it to a third one.

Coming next

Raptor Enginering kindly gave me access to a Power 9 through their Integricloud hosting.

We could run some extensive benchmarks and we found some peculiar behaviour with the C compilers available on the machine and that got me, Luc and Alexandra a little puzzled.

Next time I’ll try to collect in a little more organic way what I randomly put on my twitter as I noticed it.