Gentoo Logo
Gentoo Logo Side
Gentoo Spaceship

Contributors:
. Aaron W. Swenson
. Agostino Sarubbo
. Alexey Shvetsov
. Alexis Ballier
. Alexys Jacob
. Alice Ferrazzi
. Alice Ferrazzi
. Andreas K. Hüttel
. Anthony Basile
. Arun Raghavan
. Bernard Cafarelli
. Christian Ruppert
. Christopher Díaz Riveros
. Chí-Thanh Christopher Nguyễn
. Denis Dupeyron
. Detlev Casanova
. Diego E. Pettenò
. Domen Kožar
. Doug Goldstein
. Eray Aslan
. Fabio Erculiani
. Gentoo Haskell Herd
. Gentoo Miniconf 2016
. Gentoo Monthly Newsletter
. Gentoo News
. Gilles Dartiguelongue
. Greg KH
. Göktürk Yüksek
. Hanno Böck
. Hans de Graaff
. Ian Whyman
. Jason A. Donenfeld
. Jeffrey Gardner
. Joachim Bartosik
. Johannes Huber
. Jonathan Callen
. Jorge Manuel B. S. Vicetto
. Kristian Fiskerstrand
. Lance Albertson
. Liam McLoughlin
. Luca Barbato
. Marek Szuba
. Mart Raudsepp
. Matt Turner
. Matthew Thode
. Michael Palimaka
. Michal Hrusecky
. Michał Górny
. Mike Doty
. Mike Gilbert
. Mike Pagano
. Nathan Zachary
. Pacho Ramos
. Patrick Kursawe
. Patrick Lauer
. Patrick McLean
. Paweł Hajdan, Jr.
. Piotr Jaroszyński
. Rafael G. Martins
. Remi Cardona
. Richard Freeman
. Robin Johnson
. Ryan Hill
. Sean Amoss
. Sebastian Pipping
. Sergei Trofimovich
. Steev Klimaszewski
. Stratos Psomadakis
. Sven Vermeulen
. Sven Wegener
. Thomas Raschbacher
. Yury German
. Zack Medico

Last updated:
June 22, 2018, 09:05 UTC

Disclaimer:
Views expressed in the content published here do not necessarily represent the views of Gentoo Linux or the Gentoo Foundation.


Bugs? Comments? Suggestions? Contact us!

Powered by:
Planet Venus

Welcome to Gentoo Universe, an aggregation of weblog articles on all topics written by Gentoo developers. For a more refined aggregation of Gentoo-related topics only, you might be interested in Planet Gentoo.

June 16, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

The recent GitHub craze that got a number of Free Software fundamentalists to hurry away from GitHub towards other hosting solutions.

Whether it was GitLab (a fairly natural choice given the nature of the two services), BitBucket, or SourceForge (which is trying to rebuild a reputation as a Free Software friendly hosting company), there are a number of options of new SaaS providers.

At the same time, a number of projects have been boasting (and maybe a bit too smugly, in my opinion) that they self-host their own GitLab or similar software, and suggested other projects to do the same to be “really free”.

A lot of the discourse appears to be missing nuance on the compromises that using SaaS hosting providers, self-hosting for communities and self-hosting for single projects, and so I thought I would gather my thoughts around this in one single post.

First of all, you probably remember my thoughts on self-hosting in general. Any solution that involves self-hosting will require a significant amount of ongoing work. You need to make sure your services keep working, and keep safe and secure. Particularly for FLOSS source code hosting, it’s of primary importance that the integrity and safety of the source code is maintained.

As I already said in the previous post, this style of hosting works well for projects that have a community, in which one or more dedicated people can look after the services. And in particular for bigger communities, such as KDE, GNOME, FreeDesktop, and so on, this is a very effective way to keep stewardship of code and community.

But for one-person projects, such as unpaper or glucometerutils, self-hosting would be quite bad. Even for xine with a single person maintaining just site+bugzilla it got fairly bad. I’m trying to convince the remaining active maintainers to migrate this to VideoLAN, which is now probably the biggest Free Software multimedia project and community.

This is not a new problem. Indeed, before people rushed in to GitHub (or Gitorious), they rushed in to other services that provided similar integrated environments. When I became a FLOSS developer, the biggest of them was SourceForge — which, as I noted earlier, was recently bought by a company trying to rebuild its reputation after a significant loss of trust. These environments don’t only include SCM services, but also issue (bug) trackers, contact email and so on so forth.

Using one of these services is always a compromise: not only they require an account on each service to be able to interact with them, but they also have a level of lock-in, simply because of the nature of URLs. Indeed, as I wrote last year, just going through my old blog posts to identify those referencing dead links had reminded me of just how many project hosting services shut down, sometimes dragging along (Berlios) and sometimes abruptly (RubyForge).

This is a problem that does not only involve services provided by for-profit companies. Sunsite, RubyForge and Berlios didn’t really have companies behind, and that last one is probably one of the closest things to a Free Software co-operative that I’ve seen outside of FSF and friends.

There is of course Savannah, FSF’s own Forge-lookalike system. Unfortunately for one reason or another it has always lagged behind the featureset (particularly around security) of other project management SaaS. My personal guess is that it is due to the political nature of hosting any project over on FSF’s infrastructure, even outside of the GNU project.

So what we need would be a politically-neutral, project-agnostic hosting platform that is a co-operative effort. Unfortunately, I don’t see that happening any time soon. The main problem is that project hosting is expensive, whether you use dedicated servers or cloud providers. And it takes full time people to work as system administrators to keep it running smoothly and security. You need professionals, too — or you may end up like lkml.org being down when its one maintainer goes on vacation and something happens.

While there are projects that receive enough donations that they would be able to sustain these costs (see KDE, GNOME, VideoLAN), I’d be skeptical that there would be an unfocused co-operative that would be able to take care of this. Particularly if it does not restrict creation of new projects and repositories, as that requires particular attention to abuse, and to make good guidelines of which content is welcome and which one isn’t.

If you think that that’s an easy task, consider that even SourceForge, with their review process, that used to take a significant amount of time, managed to let joke projects use their service and run on their credentials.

A few years ago, I would have said that SFLC, SFC and SPI would be the right actors to set up something like this. Nowadays? Given their infights I don’t expect them being any useful.

June 14, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

Today's great news is that our manuscript "Nanomechanical characterization of the Kondo charge dynamics in a carbon nanotube" has been accepted for publication by Physical Review Letters.

The Kondo effect is a many-body phenomenon at low temperature that results from a quantum state degeneracy, as, e.g., the one of spin states in absence of a magnetic field. In its simplest case, it makes a quantum dot, in our case a carbon nanotube with some trapped electrons on it, behave very different for an even and an odd number of electrons. At an even number of trapped electrons, no current can flow through the nanotube, since temperature and applied bias voltage are too low to charge it with one more elementary charge; this phenomenon is called Coulomb blockade. Strikingly, at odd electron number, when two degenerate quantum states in the nanotube are available, Coulomb blockade seems not to matter, and a large current can flow. Theory explains this by assuming that a localized electron couples to electrons in the contacts, forming a combined, delocalized singlet quantum state.
What carries the Kondo-enhanced current, and how does the electric charge now accumulate in the carbon nanotube? We use the vibration of the macromolecule to measure this. As also in the case of, e.g., a guitar string, the resonance frequency of a nanotube changes when you pull on it; in the case of the carbon nanotube this is sensitive enough to resolve fractions of the force caused by a single elementary charge. From the vibration frequency, as function of the electrostatic potential, we calculate the average number of electrons on the nanotube, and can then compare the odd and even number cases.
A surprising result of our evaluation is that the charge trapped on the nanotube behaves the same way in the even and odd occupation case, even though the current through it is completely different. Sequential tunneling of electrons can model the charge accumulation, and with it the mechanical behaviour. The large Kondo current is carried by virtual occupation of the nanotube alone, i.e., electrons tunneling on and immediately off again so they do not contribute to the charge on it.

"Nanomechanical Characterization of the Kondo Charge Dynamics in a Carbon Nanotube"
K. J. G. Götz, D. R. Schmid, F. J. Schupp, P. L. Stiller, Ch. Strunk, and A. K. Hüttel
Physical Review Letters 120, 246802 (2018); arXiv:1802.00522 (PDF, HTML, supplementary information)

Luca Barbato a.k.a. lu_zero (homepage, bugs)
Video Compression Bounty Hunters (June 14, 2018, 18:43 UTC)

In this post, we (Luca Barbato and Luc Trudeau) joined forces to talk about the awesome work we’ve been doing on Altivec/VSX optimizations for the libvpx library, you can read it here or on Luc’s medium.

Both of us where in Brussels for FOSDEM 2018, Luca presented his work on rust-av and Luc was there to hack on rav1e – an experimental AV1 video encoder in Rust.

Luca joined the rav1e team and helped give hints about how to effectively leverage rust. Together, we worked on AV1 intra prediction code, among the other things.

Luc Trudeau: I was finishing up my work on Chroma from Luma in AV1, and wanted to stay involved in royalty free open source video codecs. When Luca talked to me about libvpx bounties on Bountysource, I was immediately intrigued.

Luca Barbato: Luc just finished implementing the Neon version of his CfL work and I wondered how that code could work using VSX. I prepared some of the machinery that was missing in libaom and Luc tried his hand on Altivec. We still had some pending libvpx work sponsored by IBM and I asked him if he wanted to join in.

What’s libvpx?

For those less familiar, libvpx is the official Google implementation of the VP9 video format. VP9 is most notably used in YouTube and Netflix. VP9 playback is available on some browsers including Chrome, Edge and Firefox and also on Android devices, covering the 75.31% of the global user base.

Ref: caniuse.com VP9 support in browsers.

Why use VP9, when the de facto video format is H.264/AVC?

Because VP9 is royalty free and the bandwidth savings are substantial when compared to H.264 when playback is available (an estimated 3.3B devices support VP9). In other words, having VP9 as a secondary codec can pay for itself in bandwidth savings by not having to send H.264 to most users.

Ref: Netflix VP9 compression analysis.

Why care about libvpx on Power?

Dynamic adaptive streaming formats like HLS and MPEG DASH have completely changed the game of streaming video over the internet. Streaming hardware and custom multimedia servers are being replaced by web servers.

From the servers’ perspective streaming video is akin to serving small videos files; lots of small video files! To cover all clients and most network conditions a considerable amount of video files must be encoded, stored and distributed.

Things are changing fast and while the total cost of ownership of video content for previous generation video formats, like H.264, was mostly made up of bandwidth and hosting, encoding costs are growing with more complex video formats like HEVC and VP9.

This complexity is reported to have grown exponentially with the upcoming AV1 video format. A video format, built on the libVPX code base, by the Alliance for Open Media, of which IBM is a founding member.

Ref: Facebook’s AV1 complexity analysis

At the same time, IBM and its partners in the OpenPower Foundation are releasing some very impressive hardware with the new Power9 processor line up. Big Iron Power9 systems, like the Talos II from Raptor Computing Systems and the collaboration between Google and Rackspace on Zaius/Barreleye servers, are ideal solutions to the tackle the growing complexity of video format encoding.

However, these awesome machines are currently at a disadvantage when encoding video. Without the platform specific optimizations, that their competitors enjoy, the Power9 architecture can’t be fully utilized. This is clearly illustrated in the x264 benchmark released in a recent Phoronix article.

Ref: Phoronix x264 server benchmark.

Thanks to the optimization bounties sponsored by IBM, we are hard at work bridging the gap in libvpx.

Optimization bounties?

Just like bug bounty programs, optimization make for great bounties. Companies that see benefit in platform specific optimizations for video codecs can sponsor our bounties on the Bountysource platform.

Multiple companies can sponsor the same bounty, thus sharing cost of more important bounties. Furthermore, bounties are a minimal risk investment for sponsors, as they are only paid out when the work is completed (and peer reviewed by libvpx maintainers)

Not only is the Bountysource platform a win for companies that directly benefit from the bounties they are sponsoring, it’s also a win for developers (like us) who can get paid to work on free and open source projects that we are passionate about. Optimization bounties are a source of sustainability in the free and open source software ecosystem.

How do you choose bounties?

Since we’re a small team of bounty hunters (Luca Barbato, Alexandra Hájková, Rafael de Lucena Valle and Luc Trudeau), we need to play it smart and maximize the impact of our work. We’ve identified two common use cases related to streaming on the Power architecture: YouTube-like encodes and real time (a.k.a. low latency) encodes.

By profiling libvpx under these conditions, we can determine the key functions to optimize. The following charts show the percentage of time spent the in top 20 functions of the libvpx encoder (without Altivec/VSX optimisations) on a Power8 system, for both YouTube-like and real time settings.

It’s interesting to see that the top 20 functions make up about 80% of the encoding time. That’s similar in spirit to the Pareto principle, in that we don’t have to optimize the whole encoder to make the Power architecture competitive for video encoding.

We see a similar distribution between YouTube-like encoding settings and real time video encoding. In other words, optimization bounties for libvpx benefit both Video on Demand (VOD) and live broadcast services.

We add bounties on the Bountysource platform around common themed functions like: convolution, sum of absolute differences (SAD), variance, etc. Companies interested in libvpx optimization can go and fund these bounties.

What’s the impact of this project so far?

So far, we delivered multiple libvpx bounties including:

  • Convolution
  • Sum of absolute differences (SAD)
  • Quantization
  • Inverse transforms
  • Intra prediction
  • etc.

To see the benefit of our work, we compiled the latest version of libVPX with and without VSX optimizations and ran it on a Power8 machine. Note that the C compiled versions can produce Altivec/VSX code via auto vectorization. The results, in frames per minutes, are shown below for both YouTube-like encoding and Real time encoding.

Our current VSX optimizations give approximately a 40% and 30% boost in encoding speed for YouTube-like and real time encoding respectively. Encoding speed increases in the range of 10 to 14 frames per minute can considerably reduce cloud encoding costs for Power architecture users.

In the context of real time encoding, the time saved by the platform optimization can be put to good use to improve compression efficiency. Concretely, a real time encoder will encode in real time speed, but speeding up the encoders allows for operators to increase the number of coding tools, resulting in better quality for the viewers and bandwidth savings for operators.

What’s next?

We’re energized by the impact that our small team of bounty hunters is having on libvpx performance for the Power architecture and we wanted to share it in this blog post. We look forward to getting even more performance from libvpx on the Power architecture. Expect considerable performance improvement for the Power architecture in the next libvpx release (1.8).

As IBM targets its Power9 line of systems at heavy cloud computations, it seems natural to also aim all that power at tackling the growing costs of AV1 encodes. This won’t happen without platform specific optimizations and the time to start is now; as the AV1 format is being finalized, everyone is still in the early phases of optimization. We are currently working with our sponsors to set up AV1 bounties, so stay tuned for an upcoming post.

June 13, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

So .. I just spent way too much time figuring out why my django-rest-framework did not work with authentication enabled .. As usual turns out it was simple: the culprit was mod_wsgi and it's default settings:

All I had to do was set this option in my apache config:

WSGIPassAuthorization On

before that I kept getting this error: {"detail":"Authentication credentials were not provided."}

Who would've thought that was Off by default .. also it seems to be a good idea to pass --basic with --user <user>:<pwd> with curl to make sure it uses basic auth.

On the plus side I did also look into Token based auth for the django rest framework which I find to be more useful than basic auth when using from RESTEasy

One thing about RESTEasy & Token based auth is that one needs to set the header manually (the instad of providing auth=('user','pwd') to resteasy.RESTEasy(...)

so it goes something like this:

from resteasy import RESTEasy, json


api = RESTEASY(base_url=<baseurl>, encoder=json.dumps, decoder=json.loads, <other options as needed>)
api.session.headers['Authorization'] = 'Token <token>'

data = api.route('bla/')
...

June 09, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I have written before about the CRM I wrote for a pizzeria and I am happy to see that even FSFE started looking into Free Software for SME. I also noted the needs for teams to develop healthy projects. Today I want to give an example of why I think these things are not as easy as most people expect them to be, and how many different moving parts exist that are required to align to make Free Software for SME.

As I’m no longer self-employed, and I have no intention of going back to be a MSP in my lifetime, what I’m writing here is more of a set of “homework pointers” if a community of SME-targeted Free Software projects would be formed.

I decided to focus in my thoughts on the need of a brink and mortar store (or high street store if you prefer), mostly because it has a subset of the requirements that I could think of, compared to a restaurant like the pizza place I actually worked with.

These notes are also probably a lot more scattered and incomplete than I would like, because I have only worked retail for a short while, between high school and the two miserable week of university, nearly fifteen years ago, in a bookstore to be precise.

For most of the people who have not worked retail, it might seem like the most important piece of software/hardware for a store is the till, because that is what they interact with most of the time. While the till systems (also called POS) are fairly important, as those are in direct contact with the customer, they are only the tip of the iceberg.

But let’s start with the POS: whether you plan on integrating them directly with a credit card terminal or not, right now there are a number of integrated hardware/software solution for these, that include a touchscreen to input the receipt components and a (usually thermal) printer for the receipts to be printed on, while sometimes allowing the client to be emailed the receipt instead. As far as I know, there’s no Free Software system for this. I do see an increasing number of Clover tills in Europe, and Square in the United States (but these are not the only ones).

The till software is more complicated than one would think, because in addition to the effects that the customers can see (select line items, print receipt, eventually take payment), it has to be able to keep track of the cash flow, whether it is in form of actual cash, or in the form of card payments. Knowing the cash flow is a requisite for any business, as without that information you cannot plan your budgets.

In bigger operations, this would feed into a dedicated ERP system, which would often include an inventory management software — because you need to know how much stock you have and how fast it is moving, to know when to order new stock.

There is also the need to handle invoices, which usually don’t get printed by the till (you don’t want an invoice printed on thermal paper, particularly in countries like Italy, where you’re meant to keep the original of an invoice for over ten years).

And then there is the filing of payable invoices and, well, their payment. This is part of the accounting procedures, and I know of very few systems that allow integration with a bank to the point of automating this part. PSD2 is meant to require financial institutions to provide APIs to make this possible, at least in Europe, but that has been barely received yet, and we’ll have to see what the solution will be.

Different industries have different expected standards, too. When I worked in the bookstore, there was a standard piece of software that was used to consult the online stock of books from various depots, which was required to handle orders of books for people looking for something that was not in the store. While Amazon and other online services have for the most part removed the need for many to custom order books in a store, I know still a few people who do so, simply to make sure the bookstore stays up. And I assume that very similar, yet different, software and systems exist for most other fields of endeavour, such as computer components, watches, and shoes.

Depending on the size of the store, and the amount of employees, and in general the hours of operation, there may also be need for a roster management software, so that the different workers have fair (and legal) shifts, while still being able to manage days off. I don’t know how well solutions like Workday work for small realities, but in general I feel this is likely going to be one area in which Free Software won’t make an easy dent: following all the possible legal frameworks to actually be compliant with the law is the kind of work that requires a full-time staff of people, and unless something changes drastically, I don’t expect any FLOSS project to keep up with that.

You can say that this post is not giving any answer and is just adding more questions. And that’s the case, actually. I don’t have the time or energy of working on this myself, and my job does not involve working with retailers, or even developing user-focused software. I wanted to write this as a starting point of a project if someone is interested in doing so.

In particular, I think that this would be prime territory for a multi-disciplinary university project, starting from asking questions to store owners of their need, and understanding the whole user journey. Which seems to be something that FSFE is now looking into fostering, which I’m very happy about.

Please, help the answer to the question “Can you run a brink and mortar store on Free Software?” be Yes!

June 04, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I was not planning on posting on the blog until next week, trying to stick on a weekly schedule, but today’s announcement of Microsoft acquiring GitHub is forcing my hand a bit.

So, Microsoft is acquiring GitHub, and a number of Open Source developers are losing their mind, in all possible ways. A significant proportion of comments on this that I have seen on my social media is sounding doomsday, as if this spells the end of GitHub, because Microsoft is going to ruin it all for them.

Myself, I think that if it spells the end of anything, is the end of the one-stop-shop to work on any project out there, not because of anything Microsoft did or is going to do, but because a number of developers are now leaving the platform in protest (protest of what? One company buying another?)

Most likely, it’ll be the fundamentalists that will drop their projects away to GitHub. And depending on what they decide to do with their projects, it might even not show on anybody’s radar. A lot of people are pushing for GitLab, which is both an open-core self-hosted platform, and a PaaS offering.

That is not bad. Self-hosted GitLab instances already exist for VideoLAN and GNOME. Big, strong communities are in my opinion in the perfect position to dedicate people to support core infrastructure to make open source software development easier. In particular because it’s easier for a community of dozens, if not hundreds of people, to find dedicated people to work on it. For one-person projects, that’s overhead, distracting, and destructive as well, as fragmenting into micro-instances will cause pain to fork projects — and at the same time, allowing any user who just registered to fork the code in any instance is prone to abuse and a recipe for disaster…

But this is all going to be a topic for another time. Let me try to go back to my personal opinions on the matter (to be perfectly clear that these are not the opinions of my employer and yadda yadda).

As of today, what we know is that Microsoft acquired GitHub, and they are putting Nat Friedman of Xamarin fame (the company that stood behind the Mono project after Novell) in charge of it. This choice makes me particularly optimistic about the future, because Nat’s a good guy and I have the utmost respect for him.

This means I have no intention to move any of my public repositories away from GitHub, except if doing so would bring a substantial advantage. For instance, if there was a strong community built around medical devices software, I would consider moving glucometerutils. But this is not the case right now.

And because I still root most of my projects around my own domain, if I did move that, the canonical URL would still be valid. This is a scheme I devised after getting tired of fixing up where unieject ended up with.

Microsoft has not done anything wrong with GitHub yet. I will give them the benefit of the doubt, and not rush out of the door. It would and will be different if they were to change their policies.

Rob’s point is valid, and it would be a disgrace if various governments would push Microsoft to a corner requiring it to purge content that the smaller, independent GitHub would have left alone. But unless that happens, we’re debating hypothetical at the same level of “If I was elected supreme leader of Italy”.

So, as of today, 2018-06-04, I have no intention of moving any of my repositories to other services. I’ll also use a link to this blog with no accompanying comment to anyone who will suggest I should do so without any benefit for my projects.

June 03, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
The importance of teams, and teamwork (June 03, 2018, 21:04 UTC)

Today, on Twitter, I have received a reply with a phrase that, in its own sake and without connecting back with the original topic of the thread, I found significant of the dread I feel with working as a developer, particularly in many opensource communities nowadays.

Most things don’t work the way I think they work. That’s why I’m a programmer, so I can make them work the way I think they should work.

I’m not going to link back to the tweet, or name the author of the phrase. This is not about them in particular, and more about the feeling expressed in this phrase, which I would have agreed with many years ago, but now feels so much off key.

What I feel now is that programmers don’t make things work the way they think they should. And this is not intended as a nod to the various jokes about how bad programming actually is, given APIs and constraints. This is about something that becomes clear when you spend your time trying to change the world, or make a living alone (by running your own company): everybody needs help, in the form of a team.

A lone programmer may be able to write a whole operating system (cough Emacs), but that does not make it a success in and by itself. If you plan on changing the world, and possibly changing it for the better, you need a team that includes not only programmers, but experts in quite a lot of different things.

Whether it is a Free Software project, or a commercial product, if you want to have users, you need to know what they want — and a programmer is not always the most suitable person to go through user stories. Hands up all of us who have, at one point or another, facepalmed at an acquaintance taking a screenshot of a web page to paste it into Word, and tried to teach them how to print the page to PDF. While changing workflows so that they make sense may sound the easiest solution to most tech people, that’s not what people who are trying to just do their job care about. Particularly not if you’re trying to sell them (literally or figuratively) a new product.

And similarly to what users want to do, you need to know what the users need to do. While effectively all of Free Software comes with no warranty attached, even for it (and most definitely for commercial products), it’s important to consider the legal framework the software has to be used on. Except for the more anarchists of the developers out there, I don’t think anyone would feel particularly interested in breaching laws for the sake of breaching them, for instance by providing a ledger product that allows “black book accounting” as an encrypted parallel file. Or, to reprise my recent example, to provide a software solution that does not comply with GDPR.

This is not just about pure software products. You may remember, from last year, the teardown of Juicero. In this case the problems appeared to step by the lack of control over the BOM. While electronics is by far not my speciality, I have heard more expert friends and colleagues cringe at seeing the spec of projects that tried to actually become mainstream, with a BOM easily twice as expensive as the minimum.

Aside here, before someone starts shouting about that. Minimising the BOM for an electronic project may not always be the main target. If it’s a DIY project, making it easier to assemble could be an objective, so choosing more bulky, more expensive parts might be warranted. Similarly if it’s being done for prototyping, using more expensive but widely available components is generally a win too. I have worked on devices that used multi-GB SSDs for a firmware less than 64MB — but asking for on-board flash for the firmware would have costed more than the extremely overprovisioned SSDs.

And in my opinion, if you want to have your own company, and are in for the long run (i.e. not with startup mentality of getting VC capital and get acquired before even shipping), you definitely need someone to follow up the business plan and the accounting.

So no, I don’t think that any one programmer, or a group of sole programmers, can change the world. There’s a lot more than writing code, to build software. And a lot more than building software, to change society.

Consider this the reason why I will plonk-file any recruitment email that is looking for “rockstars” or “ninjas”. Not that I’m looking for a new gig as I type this, but I would at least give thought if someone was looking for a software mechanic (h/t @sysadmin1138).

June 01, 2018
Domen Kožar a.k.a. domen (homepage, bugs)

In the last 6 years working with Nix and mostly in last two years full-time, I've noticed a few patterns.

These are mostly direct or indirect result of not having a "good enough" infrastructure to support how much Nix has grown (1600+ contributors, 1500 pull requests per month).

Without further ado, I am announcing https://cachix.org - Binary Cache as a Service that is ready to be used after two months of work.

What problem(s) does cachix solve?

The main motivation is to save you time and compute resources waiting for your packages to build. By using a shared cache of already built packages, you'll only have to build your project once.

This should also speed up CI builds, as Nix can take use of granular caching of each package, rather than caching the whole build.

Another one (which I personally consider even more important) is decentralization of work produced by Nix developers. Up until today, most devs pushed their software updates into the nixpkgs repository, which has the global binary cache at https://cache.nixos.org.

But as the community grew, fitting different ideologies into one global namespace became impossible. I consider nixpkgs community to be mature but sometimes clash of ideologies with rational backing occurs. Some want packages to be featureful by default, some prefer them to be minimalist. Some might prefer lots of configuration knobs available (for example cross-compilation support or musl/glib swapping), some might prefer the build system to do just one thing, as it's easier to maintain.

These are not right or wrong opinions, but rather a specific view of use cases that software might or might not cover.

There are also many projects that don't fit into nixpkgs because their releases are too frequent, they are not available under permissive license, are simpler to manage over complete control or maintainers simply disagree with requirements that nixpkgs developers impose on contributors.

And that's fine. What we've learned in the past is not to fight these ideas, but allow them to co-exist in different domains.

If you're interested:

Domen (domen@enlambda.com)

May 28, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I grew up as a huge fan of comic books. Not only Italian Disney comics, which are something in by themselves, but also of US comics from Marvel. You could say that I grew up on Spider-Man and Duck Avenger. Unfortunately actually holding physical comic books nowadays is getting harder, simply because I’m travelling all the time, and I also try to keep as little physical media as I manage to, given the constraint of space of my apartment.

Digital comics are, thus, a big incentive for me to keep reading. And in particular, a few years ago I started buying my comics from Comixology, which was later bought by Amazon. The reason why I chose this particular service over others is that it allowed me to buy, and read, through a single service, the comics from Marvel, Dark Horse, Viz and a number of independent publishers. All of this sounded good to me.

I have not been reading a lot over the past few years, but as I moved to London, I found that the tube rides have the perfect span of time to catch up on the latest Spider-Man or finish up those Dresden Files graphic novels. So at some point last year I decided to get myself a second tablet, one that is easier to bring on the Tube than my neat but massive Samsung Tab A.

While Comixology is available for the Fire Tablet (being an Amazon property), I settled for the Lenovo Tab 4 8 Plus (what a mouthful!), which is a pretty neat “stock” Android tablet. Indeed, Lenovo customization of the tablet is fairly limited, and beside some clearly broken settings in the base firmware (which insisted on setting up Hangouts as SMS app, despite the tablet not having a baseband), it works quite neatly, and it has a positively long lasting battery.

The only real problem with that device is that it has very limited storage. It’s advertised as a 16GB device, but the truth is that only about half of it is available to the user. And that’s only after effectively uninstalling most of the pre-installed apps, most of which are thankfully not marked as system apps (which means you can fully uninstall them, instead of just keeping them disabled). Indeed, the more firmware updates, the fewer apps that are marked as system apps it seems — in my tablet the only three apps currently disabled are the File Manager, Gmail and Hangouts (this is a reading device, not a communication device). I can (and should) probably disable Maps, Calendar, and Photos as well, but that’s going to be for later.

Thankfully, this is not a big problem nowadays, as Android 6 introduced adoptable storage which allows you to use an additional SD cards for storage, transparently for both the system and the apps. It can be a bit slow depending on the card and the usage you make of the device, but as a reading device it works just great.

You were able to move apps to the SD card in older Android versions too, but in those cases you would end up with non-encrypted apps that would still store their data on the device’s main storage. For those cases, a number of applications, including for instance Audible (also an Amazon offering) allow you to select an external storage device to store their data files.

When I bought the tablet, SD card and installed Comixology on it, I didn’t know much about this part of Android to be honest. Indeed, I only checked if Comixology allowed storing the comics on the SD card, and since I found that was the case, I was all happy. I had adopted the SD card though, without realizing what that actually meant, though, and that was the first problem. Because then the documentation from Comixology didn’t actually match my experience: the setting to choose the SD card for storage didn’t appear, and I contacted tech support, who kept asking me questions about the device and what I was trying to do, but provided me no solution.

Instead, I noticed that everything was alright: as I adopted the SD card before installing the app, it got automatically installed on it, and it started using the card itself for storage, which allowed me to download as many comicbooks as I wanted, and not bother me at all.

Until some time earlier this year, I couldn’t update the app anymore. It kept failing with a strange Play Store error. So I decided to uninstall and reinstall it… at which point I had no way to move it back to the SD card! They disabled the option to allow the application to be moved in their manifest, and that’s why Play Store was unable to update it.

Over a month ago I contacted Comixology tech support, telling them what was going on, assuming that this was an oversight. Instead I kept getting stubborn responses that moving the app to the SD card didn’t move the comics (wrong), or insinuating I was using a rooted device (also wrong). I still haven’t managed to get them to reintroduce the movable app, even though the Kindle app, also from Amazon, moves to the SD card just fine. Ironically, you can read comics bought on Kindle Store with the Comixology app but, for some reason, not vice-versa. If I could just use the Kindle app I wouldn’t even bother with installing the Comixology app.

Now I cancelled my Comixology Unlimited subscription, cancelled my subscription to new issues of Spider-Man, Bleach, and a few other series, and am pondering what’s the best solution to my problems. I could use non-adopted storage for the tablet if I effectively dedicate it to Comixology — unfortunately in that case I won’t be able to download Google Play Books or Kindle content to the SD card as they don’t support the external storage mode. I could just read a few issues at a time, using the ~7GB storage space that I have available on the internal storage, but that’s also fairly annoying. More likely I’ll start buying the comics from another service that has a better understanding of the Android ecosystem.

Of course the issue remains that I have a lot of content on Comixology, and just a very limited subset of comics are DRM-free. This is not strictly speaking Comixology’s fault: the publishers are the one deciding whether to DRM their content or not. But it definitely shows an issue that many publishers don’t seem to grasp: in front of technical problems like this, the consumer will have better “protection” if they would have just pirated the comics!

For the moment, I can only hope that someone reading this post happens to work for, or know someone working for, Comixology or Amazon (in the product teams — I know a number of people in the Amazon production environment, but I know they are far away from the people who would be able to fix this), and they can update the Comixology app to be able to work with modern Android, so that I can resume reading all my comics easily.

Or if Amazon feels like that, I’d be okay with them giving me a Fire tablet to use in place of the Lenovo. Though I somewhat doubt that’s something they would be happy on doing.

May 25, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
The story of Gentoo management (May 25, 2018, 15:43 UTC)

I have recently made a tabular summary of (probably) all Council members and Trustees in the history of Gentoo. I think that this table provides a very succinct way of expressing the changes within management of Gentoo. While it can’t express the complete history of Gentoo, it can serve as a useful tool of reference.

What questions can it answer? For example, it provides an easy way to see how many terms individuals have served, or how long Trustee terms were. You can clearly see who served both on the Council and on the Board and when those two bodies had common members. Most notably, it collects a fair amount of hard-to-find data in a single table.

Can you trust it? I’ve put an effort to make the developer lists correct but given the bad quality of data (see below), I can’t guarantee complete correctness. The Trustee term dates are approximate at best, and oriented around elections rather than actual term (which is hard to find). Finally, I’ve merged a few short-time changes such as empty seats between resignation and appointing a replacement, as expressing them one by one made little sense and would cause the tables to grow even longer.

This article aims to be the text counterpart to the table. I would like to tell the history of the presented management bodies, explain the sources that I’ve used to get the data and the problems that I’ve found while working on it.

As you could suspect, the further back I had to go, the less good data I was able to find. The problems included the limited scope of our archives and some apparent secrecy of decision-making processes at the early time (judging by some cross-posts, the traffic on -core mailing list was significant, and it was not archived before late 2004). Both due to lack of data, and due to specific interest in developer self-government, this article starts in mid-2003.

Continue reading

Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

So i needed some quick way to verify email newsletter subscriptiosn .. (new european dataprotection laws ..)

I decided I don't want php and no DB, so - since it is installed anyway - I used mod_wsgi again.

Here's the parts of the apache config (from within the VirtualHost:

#newsletter verification wsgi app

WSGIDaemonProcess myapp home=/path_to/newsletter_verify/myenv processes=1 threads= 5 display-name=[wsgi-myapp]httpd python-path=/path_to/newsletter_verify:/path_to/newsletter_verify/myenv/lib64/python3.6/site-packages

WSGIProcessGroup myapp
WSGIApplicationGroup %{GLOBAL}

WSGIScriptAlias /newsletter /path_to/newsletter_verify/apache.wsgi

Simple config, just added parts for virtualenv stuff (not really needed in the end but needed it while I was testing stuff.

here's the apache.wsgi:

import newsletter

def application(environ, start_response):
return newsletter.handle_request(environ, start_response)

then the newsletter.py file:


import sys
import csv
import datetime
from urllib.parse import urlparse, parse_qs

sys.stdout = sys.stderr
DEBUG = False


def handle_request(environ, start_response):
# standard values
status = '200 OK'
output = b'<html><head><title>Hello World</title></head><body><h1>hello world!</h1></body></html>'
# use urlib instead of seperate environ stuff
url = urlparse(environ['REQUEST_URI'])
qs = parse_qs(url.query)
if url.path == '/newsletter' and 'confirm' in qs:
myuuid = qs['confirm'][0]
contact = None
with open('/path_to/newsletter_verify/uuids.csv', 'r') as uuid_file:
uuids = [line.rstrip() for line in uuid_file]
# write results to csv files
if myuuid in uuids:
contact = myuuid
with open('/path_to/newsletter_verify/uuids_confirmed.csv', 'a') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow([myuuid,
str(datetime.datetime.now()),
environ['REMOTE_ADDR'] + ':' + environ['REMOTE_PORT'],
environ['HTTP_USER_AGENT']])
else:
# uuid not found .. log the error though
with open('/path_to/newsletter_verify/uuids_error.csv', 'a') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow([myuuid,
str(datetime.datetime.now()),
environ['REMOTE_ADDR'] + ':' + environ['REMOTE_PORT'],
environ['HTTP_USER_AGENT']])

if contact:
output = bytes('''<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Infomail...</title>
</head>
<body>
<h1>thanks for letting us send you emails.</h1>
</body>
</html>
''', 'utf-8')

else:
output = bytes('''<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Infomail...</title>
</head>
<body>
<h1>We did not find your address .. please try again/h1>
</body>
</html>
''', 'utf-8')
else:
status = '404 NOT FOUND'
output = b'<html><head><title>404 Page not found</title></head><body><h1>404 Page not found</h1><p>please check the link</p></body></html>'

response_headers = [('Content-type', 'text/html'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)

return [output]

The logic is simple: check request, if correct, extract uuid, compare it to a list from a csv and then write to output files who did click the link +ip, user agent. The rest is just plain static html, manual handling of .. well everything.

Python Dataset Library -- CSV to SQLite ;) (May 25, 2018, 09:24 UTC)

Out of frustration of just needing to handle some (temporary) data properly and save it (csv and json are nice, but i wanted sqlilte in this case to query it easier) I found the dataset library.

This makes handling Databases (supposedly) as easy as json, .. and i have to admit it really makes stuff where one doesn't want to have full control over table creation,.. really simple and fast

Here's one really quick'n dirty csv -> sql (sqlite in my case) code:

import csv, uuid, dataset


def csv2dataset(fin, fout):
with open(fin, 'r') as finp:
inpdata = csv.DictReader(finp)
db = dataset.connect('sqlite:///' + fout)
table = db['contact']
for contact in inpdata:
contact['uuid'] = uuid.uuid4().hex
contact['verified'] = False
table.insert(contact)


if __name__ == '__main__':
csv2dataset('addresses.csv', 'newsletterdata.sqlite')

all it does is open the csv file, uses DictReader to get the data.

Then we "connect" to the SQLite db and get/create  a table called 'contact' (this syntax seems to be the same as db.get_table - which creates a table if it doesn't exist .. there also is db.load_table which only fetches existing ones).

Loop through the contact data from the csv, add a column with an UUID (i don't like to use verification Urls that include the email, so i generated UUIDs) and a boolean column. The rest of the column names are from the source data (the colum names were in the first row of the csv file, which is what DictReader used)

The schema created from this is here:

CREATE TABLE contact (
id INTEGER NOT NULL,
first_name TEXT,
last_name TEXT,
display_name TEXT,
email TEXT,
organization TEXT,
uuid TEXT,
verified BOOLEAN,
PRIMARY KEY (id),
CHECK (verified IN (0, 1))
);

if you want to query the DB it is just about the same (import, .. and error handling / checks omitted):

contacts = db.load_table('contact')
# find one only - this is None if nothing is returned
contact = contacts.find_one(contacts.table.columns.uuid=myuuid)
# find all occurences contact - returns an iterable object even if there are no matches
contact = contacts.find(contacts.table.columns.uuid == myuuid)

For the rest please consult the documentation on thie dataset webpage

May 20, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

I needed a way to send html + plain text alternative emails (based on templates ultimately) - preferrably using python. and since I always liked twisted i turned to that again.

Here's what I came up with after reading some other tutorials and manuals:

from twisted.mail.smtp import sendmail
from twisted.internet.task import react

from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
# https://stackoverflow.com/questions/882712/sending-html-email-using-python <- multipart example

def main(reactor):
me = 'foo@bar.com'
to = 'foobar@baz.com'

message = MIMEMultipart('alternative')
message['Subject'] = 'test email sent from twisted'
message['From'] = me
message['To'] = to
html = '<html><head /><body><p style="color: red;">This is an awesome email sent with twisted!</p></body></html>'
text = 'This is an awesome email sent with twisted! (plain text version)'

# according to RFC2046 the last part is preferred
message.attach(MIMEText(text, 'plain'))
message.attach(MIMEText(html, 'html'))
print(message)

# more info on parameters (auth, forced tls,..) see twisted api docs
d = sendmail('localhost', me, to, message, port=25)
d.addBoth(print)
return d

react (main)

Pretty self explainatory I think. first we create a multipart mime object, add some headers, define our html text and palin text and then attach the 2 texts in the correct order tehn use twisted's sendmail method - and add print as callback for the deferred so the results just get printed out.

then start the reactor ;)

Michał Górny a.k.a. mgorny (homepage, bugs)

There seems to be some serious confusion around the way directories are installed in Gentoo. In this post, I would like to shortly explain the differences between different methods of creating directories in ebuilds, and instruct how to handle the issues related to installing empty directories and volatile locations.

Empty directories are not guaranteed to be installed

First things first. The standards are pretty clear here:

Behaviour upon encountering an empty directory is undefined. Ebuilds must not attempt to install an empty directory.

PMS 13.2.2 Empty directories (EAPI 7 version)

What does that mean in practice? It means that if an empty directory is found in the installation image, it may or may not be installed. Or it may be installed, and incidentally removed later (that’s the historical Portage behavior!). In any case, you can’t rely on either behavior. If you really need a directory to exist once the package is installed, you need to make it non-empty (see: keepdir below). If you really need a directory not to exist, you need to rmdir it from the image.

That said, this behavior does makes sense. It guarantees that the Gentoo installation is secured against empty directory pruning tools.

*into

The *into family of functions is used to control install destination for other ebuild helpers. By design, either they or the respective helpers create the install directories as necessary. In other words, you do not need to call dodir when using *into.

dodir

dodir is not really special in any way. It is just a convenient wrapper for install -d that prepends ${ED} to the path. It creates an empty directory the same way the upstream build system would have created it, and if the directory is left empty, it is not guaranteed to be preserved.

So when do you use it? You use it when you need to create a directory that will not be created otherwise and that will become non-empty at the end of the build process. Example use cases are working around broken build systems (that fail due to non-existing directories but do not create them), and creating directories when you want to manually write to a file there.

src_install() {
    # build system is broken and fails
    # if ${D}/usr/bin does not exist
    dodir /usr/bin
    default

    dodir /etc/foo
    sed -e "s:@libdir@:$(get_libdir):" \
        "${FILESDIR}"/foo.conf.in \
        > "${ED}"/etc/foo/foo.conf || die
}

keepdir

keepdir is the function specifically meant for installing empty directories. It creates the directory, and a keep-file inside it. The directory becomes non-empty, and therefore guaranteed to be installed and preserved. When using keepdir, you do not call dodir as well.

Note that actually preserving the empty directories is not always necessary. Sometimes packages are perfectly capable of recreating the directories themselves. However, make sure to verify that the permissions are correct afterwards.

src_install() {
    default

    # install empty directory
    keepdir /var/lib/foo
}

Volatile locations

The keepdir method works fine for persistent locations. However, it will not work correctly in directories such as /run that are volatile or /var/cache that may be subject to wiping by user. On Gentoo, this also includes /var/run (which OpenRC maintainers unilaterally decided to turn into a /run symlink), and /var/lock.

Since the package manager does not handle recreating those directories e.g. after a reboot, something else needs to. There are three common approaches to it, most preferred first:

  1. Application creates all necessary directories at startup.
  2. tmpfiles.d file is installed to create the files at boot.
  3. Init script creates the directories before starting the service (checkpath).

The preferred approach is for applications to create those directories themselves. However, not all applications do that, and not all actually can. For example, applications that are running unprivileged generally can’t create those directories.

The second approach is to install a tmpfiles.d file to create (and maintain) the directory. Those files are work both for systemd and OpenRC users (via opentmpfiles) out of the box. The directories are (re-)created at boot, and optionally cleaned up periodically. The ebuild should also use tmpfiles.eclass to trigger directory creation after installing the package.

The third approach is to make the init script create the directory. This was the traditional way but nowadays it is generally discouraged as it causes duplication between different init systems, and the directories are not created when the application is started directly by the user.

Summary

To summarize:

  1. when you install files via *into, installation directories are automatically created for you;
  2. when you need to create a directory into which files are installed in other way than ebuild helpers, use dodir;
  3. when you need to install an empty directory in a non-volatile location (and application can’t just create it on start), use keepdir;
  4. when you need to install a directory into a volatile location (and application can’t just create it on start), use tmpfiles.d.

May 14, 2018
Michal Hrusecky a.k.a. miska (homepage, bugs)

Let’s start this with a little background. I work at CZ.NIC on Turris project. So I’m definitely biased. But this post is my own, written in my free time and express just my own opinions and it explicitly doesn’t represent opinions of the company I work for unless by chance.

So now you know the background, so let’s take a look at what Turris MOX actually is. It is marketed as modular open source router. Well Turris project is about secure routers, so it makes sense. But what I like about it is that it is actually quite nice and modular single board computer. If you are wondering what single board computers are, check Wikipedia or think Raspberry Pi which is the most well known example and probably one of the worst options you have.

Competitors

If you heard just about Raspberry Pi, you heard about nothing interesting. What is wrong with Raspberry? Well it is not that bad, but it has a weird concept as it is mainly GPU and CPU is kinda afterthought and although it has USB ports and network card, it is in fact just USB HUB and USB attached network card connected to one USB port. And it is notoriously known for instabilities. Are there any better alternatives? Yes, definitely, plenty of those. I personally like Pine64, there is also Orange Pi, ODroid and plenty of others. They have various peripherals, various pricing, so you can pick up whatever HW fits your needs the best. One of those options might be Turris MOX nowadays.

One disadvantage those boards usually have in common is varying software support. Old Allwinners are mostly fine nowadays (and are going to get even better with Free electrons working on video decoder), the newer you get, the more troubles you face. Those devices mostly come with old heavily patched Android kernel. If it is popular board, it has few distribution images created by community ready. And if it is popular SoC, over the time some parts will get support in mainline kernel. So read a lot before you buy, otherwise you might get have to choose between being stuck with old kernel and having half of the hardware not working. Also with those half-baked old kernels, you have to pick the right set of features, otherwise it will blow up.

Turris MOX selling points

I already have Turris Omnia as my main router, so I would disregard the main selling point – using it as a router. That is what people seems to want the most according to the sold units on Indiegogo. People probably know what to expect there, so no point in talking about that.

What if you don’t need a router? There are cool selling points from both HW and SW point of view. HW wise the base module already has gigabit network and one USB3. So just the Start module is a perfect match for doing some home server. ARMv8 also has some crypto acceleration inside, so MOX can do quite well in terms of AES which means that encryption wouldn’t slow down your rotating disks if you want NAS. Modularity is nice, from home server point of view, probably the most interesting extension is more powered USB 3.0. This way you can easilly add a bunchbof additional drives. Yes, you could do it with USB hub, but then there is plenty of cables around, additional powersource etc. Also cheap hubs are often crap that cretes errors on the bus and dies quickly. Apart from that, PCIe might come in handy for various other devices – well mostly what I think of is SATA controller. But some IoT stuff has cards for either USB or mPCIe.

So HW is interesting, what about software? It comes with our own OpenWRT fork with nice webui which is great if your grandma wants to use it as a router. But I wouldn’t be running it on my MOX. My MOX will be pure home server, so I will be running my favorite distribution (openSUSE) on it. So why should I care about SW it comes with? It comes with latest OpenWRT with 4.14 LTS kernel, 2018.3 u-Boot (if not newer) and it will be fully supported there as the software is part of the deal. Even if I’m not going to use it, there will be source of up to date kernels and u-Boots. For Turris 1.X we migrated from 3.10 to 3.18 to 4.4 and 4.14 is work in progress. That kinda proves that there will be newer kernels provided by CZ.NIC. Which itself is nice. What is even better is that there is a work in progress to upstream all bits and pieces. It’s not there yet with Omnia, but I know that the main obstacles are few unusual bits of HW where kernel abstraction is not ready yet. In MOX case, there is no such thing, those were all avoided so there should be nothing stopping my colleagues from upstreaming everything and actually it is already being done. Therefor it shouldn’t give you any headaches to get it running.

Common complaints

People often complained that it had just 512M RAM. Yep, it was a limitation, but nowadays you can get one gig upgrade. People are still asking for more – 4G or 8G or … Something like that is IMHO nonsense. I have VPS that is running Nextcloud, DNS server, mail server and few other little things and 1G RAM is fine for it. People need to realize, that they are not going to run Xorg on it and they are not going to browse the modern web on their MOX. So there is no need for such huge RAM. For NAS it is still perfect with just 512M unless you decide to run Nextcloud on it. Then you would have to tune it down a little to fit or preferably buy an RAM upgrade.

It also doesn’t have a GPU – surprise surprise, it’s a router! And at least it doesn’t eat your precious RAM. It could be better with SATA, more USB 3.0 ports etc. directly on CPU board. But it would make the basic price higher, not everybody needs it and you can put that into mPCIe so it’s still an option when you decide to enhance it. One can also always complain about the price, but in general, that is the tradeof you have to make – is modularity, extensibility and support worth those few extra bucks compared to all-built-in solution that best matches your needs? For me it is at least in some use-cases.

Conclusion

I believe that the device makes sense, is open and well supported. Support is something that you don’t see when browsing hardware stores. Yep, you can get this nice Chinese tablet for 99.99 and it comes with Android! But with Android 4.0, company is GPL violator and you will never get anything newer than 3.10 kernel on it unless you spend huge amount of work on it. I think MOX is on the other side of the spectrum. It’s not dirty cheap (but also not that expensive) but the support and things you can do with it is going to be much better. So if you like it, support it on Idiegogo before it is too late.

May 13, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
A short history of Gentoo copyright (May 13, 2018, 19:04 UTC)

As part of the recent effort into forming a new copyright policy for Gentoo, a research into the historical status has been conducted. We’ve tried to establish all the key events regarding the topic, as well as the reasoning behind the existing policy. I would like to shortly note the history based on the evidence discovered by Robin H. Johnson, Ulrich Müller and myself.

Continue reading

May 12, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
On OpenPGP (GnuPG) key management (May 12, 2018, 06:40 UTC)

Over the time, a number of developers have had problems following the Gentoo OpenPGP key policy (GLEP 63. In particular, the key expiration requirements have resulted in many developers wanting to replace their key unnecessarily. I’ve been asked to write some instructions on managing your OpenPGP key, and I’ve decided to go for a full blog post with some less-known tips. I won’t be getting into detailed explanations how to use GnuPG though — you may still need to read the documentation after all.

Primary key and subkeys

An OpenPGP key actually consists of one or more pairs of public and private keys — the primary key (or root key, in GLEP 63 naming), and zero or more subkeys. Ideally, the primary key is only used to create subkeys, UIDs, manipulate them and sign other people’s keys. All ‘non-key’ cryptographic operations are done using subkeys. This reduces the wear of the primary key, and the risk of it being compromised.

If you don’t use a smartcard, then a good idea would be to move the private part of primary key off-site since you don’t need it for normal operation. However, before doing that please remember to always have a revocation certificate around. You will need it to revoke the primary key if you lose it. With GnuPG 2.1, removing private keys is trivial. First, list all keys with keygrips:

$ gpg --list-secret --with-keygrip
/home/you/.gnupg/pubring.kbx
-------------------------------
sec   rsa2048/0xBBC7E6E002FE74E8 2018-05-12 [SC] [expires: 2020-05-11]
      55642983197252C35550375FBBC7E6E002FE74E8
      Keygrip = B51708C7209017A162BDA515A9803D3089B993F0
uid                   [ultimate] Example key 
ssb   rsa2048/0xB7BA421CDCD4AF16 2018-05-12 [E] [expires: 2020-05-11]
      Keygrip = 92230550DA684B506FC277B005CD3296CB70463C

Note that the output may differ depending on your settings. The sec entry indicates a primary key. Once you find the correct key, just look for a file named after its Keygrip in ~/.gnupg/private-keys-v1.d (e.g. B51708C7209017A162BDA515A9803D3089B993F0.key here). Move that file off-site and voilà!

In fact, you can go even further and use a dedicated off-line system to create and manage keys, and only transfer appropriate private keys (and public keyring updates) to your online hosts. You can transfer and remove any other private key the same way, and use --export-key to transfer the public keys.

How many subkeys to use?

Create at least one signing subkey and exactly one encryption subkey.

Signing keys are used to sign data, i.e. to prove its integrity and authenticity. Using multiple signing subkeys is rather trivial — you can explicitly specify the key to use while creating a signature (note that you need to append ! to key-id to force non-default subkey), and GnuPG will automatically use the correct subkey when verifying the signature. To reduce the wear of your main signing subkey, you can create a separate signing subkey for Gentoo commits. Or you can go ever further, and have a separate signing subkey for each machine you’re using (and keep only the appropriate key on each machine).

Encryption keys are used to encrypt messages. While technically it is possible to have multiple encryption subkeys, GnuPG does not make that meaningful. When someone will try to encrypt a message to you, it will insist on using the newest key even if multiple keys are valid. Therefore, use only one encryption key to avoid confusion.

There is also a third key class: authentication keys that can be used in place of SSH keys. If you intend to use them, I suggest the same rule as for SSH keys, that is one key for each host holding the keyring. More on using GnuPG for SSH below.

To summarize: use one encryption subkey, and as many signing and authentication subkeys as you need. Using more subkeys reduces individual wear of each key, and makes it easier to assess the damage if one of them gets compromised.

When to create a new key?

One of the common misconceptions is that you need to create a new key when the current one expires. This is not really the purpose of key expiration — we use it mostly to automatically rule out dead keys. There are generally three cases when you want to create a new key:

  1. if the key is compromised,
  2. if the primary key is irrecoverably lost,
  3. if the key uses really weak algorithm (e.g. short DSA key).

Most of the time, you will just decide to prolong the primary key and subkeys, i.e. use the --edit-key option to update their expiration dates. Note that GnuPG is not very user-friendly there. To prolong the primary key, use expire command without any subkeys selected. To prolong one or more subkeys, select them using key and then use expire. Normally, you will want to do this periodically, before the expiration date to give people some time to refresh. Add it to your calendar as a periodic event.

$ gpg --edit-key 0xBBC7E6E002FE74E8
Secret key is available.

sec  rsa2048/0xBBC7E6E002FE74E8
     created: 2018-05-12  expires: 2020-05-11  usage: SC  
     trust: ultimate      validity: ultimate
ssb  rsa2048/0xB7BA421CDCD4AF16
     created: 2018-05-12  expires: 2020-05-11  usage: E   
[ultimate] (1). Example key <example@example.com>

gpg> expire
Changing expiration time for the primary key.
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 3y
Key expires at Tue May 11 12:32:35 2021 CEST
Is this correct? (y/N) y

sec  rsa2048/0xBBC7E6E002FE74E8
     created: 2018-05-12  expires: 2021-05-11  usage: SC  
     trust: ultimate      validity: ultimate
ssb  rsa2048/0xB7BA421CDCD4AF16
     created: 2018-05-12  expires: 2020-05-11  usage: E   
[ultimate] (1). Example key <example@example.com>

gpg> key 1

sec  rsa2048/0xBBC7E6E002FE74E8
     created: 2018-05-12  expires: 2021-05-11  usage: SC  
     trust: ultimate      validity: ultimate
ssb* rsa2048/0xB7BA421CDCD4AF16
     created: 2018-05-12  expires: 2020-05-11  usage: E   
[ultimate] (1). Example key <example@example.com>

gpg> expire
Changing expiration time for a subkey.
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 1y
Key expires at Sun May 12 12:32:47 2019 CEST
Is this correct? (y/N) y

sec  rsa2048/0xBBC7E6E002FE74E8
     created: 2018-05-12  expires: 2021-05-11  usage: SC  
     trust: ultimate      validity: ultimate
ssb* rsa2048/0xB7BA421CDCD4AF16
     created: 2018-05-12  expires: 2019-05-12  usage: E   
[ultimate] (1). Example key <example@example.com>

If one of the conditions above applies to one of your subkeys, or you think that it has reached a very high wear, you will want to replace the subkey. While at it, make sure that the old key is either expired or revoked (but don’t revoke the whole key accidentally!). If one of those conditions applies to your primary key, revoke it and start propagating your new key.

Please remember to upload your key to key servers after each change (using --send-keys).

To summarize: prolong your keys periodically, rotate subkeys whenever you consider that beneficial but avoid replacing the primary key unless really necessary.

Using gpg-agent for SSH authentication

If you already have to set up a secure store for OpenPGP keys, why not use it for SSH keys as well? GnuPG provides ssh-agent emulation which lets you use an OpenPGP subkey to authenticate via SSH.

Firstly, you need to create a new key. You need to use the --expert option to access additional options. Use addkey to create a new key, choose one of the options with custom capabilities and toggle them from the default sign+<em<encrypt to authenticate:

$ gpg --expert --edit-key 0xBBC7E6E002FE74E8
Secret key is available.

sec  rsa2048/0xBBC7E6E002FE74E8
     created: 2018-05-12  expires: 2020-05-11  usage: SC  
     trust: ultimate      validity: ultimate
ssb  rsa2048/0xB7BA421CDCD4AF16
     created: 2018-05-12  expires: 2020-05-11  usage: E   
[ultimate] (1). Example key <example@example.com>

gpg> addkey
Please select what kind of key you want:
   (3) DSA (sign only)
   (4) RSA (sign only)
   (5) Elgamal (encrypt only)
   (6) RSA (encrypt only)
   (7) DSA (set your own capabilities)
   (8) RSA (set your own capabilities)
  (10) ECC (sign only)
  (11) ECC (set your own capabilities)
  (12) ECC (encrypt only)
  (13) Existing key
Your selection? 8

Possible actions for a RSA key: Sign Encrypt Authenticate 
Current allowed actions: Sign Encrypt 

   (S) Toggle the sign capability
   (E) Toggle the encrypt capability
   (A) Toggle the authenticate capability
   (Q) Finished

Your selection? s

Possible actions for a RSA key: Sign Encrypt Authenticate 
Current allowed actions: Encrypt 

   (S) Toggle the sign capability
   (E) Toggle the encrypt capability
   (A) Toggle the authenticate capability
   (Q) Finished

Your selection? e

Possible actions for a RSA key: Sign Encrypt Authenticate 
Current allowed actions: 

   (S) Toggle the sign capability
   (E) Toggle the encrypt capability
   (A) Toggle the authenticate capability
   (Q) Finished

Your selection? a

Possible actions for a RSA key: Sign Encrypt Authenticate 
Current allowed actions: Authenticate 

   (S) Toggle the sign capability
   (E) Toggle the encrypt capability
   (A) Toggle the authenticate capability
   (Q) Finished

Your selection? q
[...]

Once the key is created, find its keygrip:

$ gpg --list-secret --with-keygrip
/home/mgorny/.gnupg/pubring.kbx
-------------------------------
sec   rsa2048/0xBBC7E6E002FE74E8 2018-05-12 [SC] [expires: 2020-05-11]
      55642983197252C35550375FBBC7E6E002FE74E8
      Keygrip = B51708C7209017A162BDA515A9803D3089B993F0
uid                   [ultimate] Example key <example@example.com>
ssb   rsa2048/0xB7BA421CDCD4AF16 2018-05-12 [E] [expires: 2020-05-11]
      Keygrip = 92230550DA684B506FC277B005CD3296CB70463C
ssb   rsa2048/0x2BE2AF20C43617A0 2018-05-12 [A] [expires: 2018-05-13]
      Keygrip = 569A0C016AB264B0451309775FDCF06A2DE73473

This time we’re talking about the keygrip of the [A] key. Append that to ~/.gnupg/sshcontrol:

$ echo 569A0C016AB264B0451309775FDCF06A2DE73473 >> ~/.gnupg/sshcontrol

The final step is to have gpg-agent with --enable-ssh-support started. The exact procedure here depends on the environment used. In XFCE, it involves setting a hidden configuration option:

$ xfconf-query -c xfce4-session -p /startup/ssh-agent/type -n -t string -s gpg-agent

Further reading

May 11, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

I uploaded the photos I took @ Linuxwochen 2018 Vienna (3rd - 5th May) to my gallery:

https://gallery.lordvan.com/Events/LinuxWochen_Vienna_2018/

May 10, 2018
Sebastian Pipping a.k.a. sping (homepage, bugs)
Bash: Command output to array of lines (May 10, 2018, 12:49 UTC)

We had a case at work were multi-line output of a command should be turned into an array of lines. Here's one way to do it. Two Bash features take part with this approach:

  • $'....' syntax (a dollar right in front of a single-tick literal) activates interpolation of C-like escape sequences (see below)
  • Bash variable IFS — the internal field separator affecting the way Bash applies word splitting — is temporarily changed from default spaces-tabs-and-newlines to just newlines so that we get one array entry per line

Let me demo that:

# f() { echo $'one\ntwo  spaces' ; }

# f
one
two  spaces

# IFS=$'\n' lines=( $(f) )

# echo ${#lines[@]}
2

# echo "${lines[0]}"
one

# echo "${lines[1]}"
two  spaces

May 08, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
Copyright 101 for Gentoo contributors (May 08, 2018, 05:59 UTC)

While the work on new Gentoo copyright policy is still in progress, I think it would be reasonable to write a short article on copyright in general, for the benefit of Gentoo developers and contributors (proxied maintainers, in particular). There are some common misconceptions regarding copyright, and I would like to specifically focus on correcting them. Hopefully, this will reduce the risk of users submitting ebuilds and other files in violation of copyrights of other parties.

First of all, I’d like to point out that IANAL. The following information is based on what I’ve gathered from various sources over the years. Some or all of it may be incorrect. I take no responsibility for that. When in doubt, please contact a lawyer.

Secondly, the copyright laws vary from country to country. In particular, I have no clue how they work across two countries with incompatible laws. I attempt to provide a baseline that should work both for US and EU, i.e. ‘stay on the safe side’. However, there is no guarantee that it will work everywhere.

Thirdly, you might argue that a particular case would not stand a chance in court. However, my goal here is to avoid the court in the first place.

The guidelines follow. While I’m referring to ‘code’ below, the same rules to apply to any copyrightable material.

  1. Lack of clear copyright notice does not imply lack of copyright. When there is no license declaration clearly applicable to the file in question, it is implicitly all-rights-reserved. In other words, you can’t reuse that code in your project. You need to contact the copyright holder and ask him to give you rights to do so (i.e. add a permissive license).
  2. Copyright still holds even if the author did not list his name, made it anonymously or used a fake name. If it’s covered by an open source license, you can use it preserving the original copyright notice. If not, you need to reliably determine who the real copyright holder is.
  3. ‘Public domain’ dedication is not recognized globally (e.g. in the EU copyright is irrevocable). If you wish to release your work with no restrictions, please use an equivalent globally recognized license, e.g. CC0. If you wish to include a ‘public domain’ code in your project, please consider contacting its author to use a safer license option instead.
  4. Copyrights and licenses do not merge when combining code. Instead, each code fragment retains its original copyright. When you include code with different copyright, you should include the original copyright notice. If you modify such code fragment, you only hold copyright (and can enforce your own license) to your own changes.
  5. Copyright is only applicable to original work. It is generally agreed that e.g. a typo fix is not copyrightable (i.e. you can’t pursue copyright for doing that). However, with anything more complex than that the distinction is rather blurry.
  6. When a project uses code fragments with multiple different licenses, you need to conform to all of them.
  7. When a project specifies that you can choose between multiple licenses (e.g. BSD/GPL dual-licensing, ‘GPL-2 or newer’), you need to conform only to the terms of one of the specified licenses. However, in the context of a single use, you need to conform to all terms of the chosen license. You can’t freely combine incompatible terms of multiple licenses.
  8. Not all licenses can be combined within a single project. Before including code using a different license, please research license compatibility. Most of those rules are asymmetrical. For example:
    • you can’t include GPL code in BSD-licensed project (since GPL forbids creating derivative work with less restrictive licensing);
    • but you can include BSD-licensed code in GPL project (since BSD does not forbid using more restrictive license in derivative works);
    • also, you can include BSD/GPL dual-licensed code in BSD-licensed project (since dual-licensing allows you to choose either of the licenses).
  9. Relicensing a whole project can happen only if you obtain explicit permission from all people holding copyright to it. Otherwise, you can only relicense those fragments to which you had obtained permission (provided that the new license is compatible with the remaining licenses).
  10. Relicensing a project does not apply retroactively. The previous license still applies to the revisions of the project prior to the license change. However, this applies only to factual license changes. For example, if a MIT-licensed project included LGPL code snippet that lacked appropriate copyright notice (and added the necessary notice afterwards), you can’t use the snippet under (mistakenly attributed) MIT license.

May 04, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I’m an open source developer, because I think that open source makes for safer, better software for the whole community of users. I also think that, by making more software available to a wider audience, we improve the quality, safety and security of every user out there, and as such I will always push for more, and more open, software. This is why I support the Public Money, Public Code campaign by the FSFE for opening up the software developed explicitly for public administrations.

But there is one space that I found is quite lacking when it comes with open source: business-oriented software. The first obvious thing is the lack of good accounting software, as Jonathan has written extensively about, but there is more. When I was consulting as a roaming sysadmin (or with a more buzzwordy, and marketing-friendly term, a Managed Services Provider — MSP), a number of my customers relied heavily on nearly off-the-shelf software to actually run their business. And in at least a couple of cases, they commissioned me custom-tailored software for that.

In a lot of cases, there isn’t really a good reason not to open-source this software: while it is required to run certain businesses, it is clearly not enough to run them. And yet there are very few examples of such software in the open, and that includes from me: my customers didn’t really like the idea of releasing the software to others, even after I offered a discount on the development price.

I want to show the details of an example of one such custom software, something that, to give a name to it, would be a CRM (Customer Relationship Manager), that I built for a pizzeria in Italy. I won’t be opening the source code for it (though I wish I could do so), and I won’t be showing screenshots or provide the name of the actual place, instead referring to it as Pizza Planet.

This CRM (although the name sounds more professional than what it really was), was custom-designed to suit the work environment of the pizzeria — that is to say, I did whatever they asked me, despite it disagreeing with my sense of aesthetics and engineering. The basic idea was very simple: when a customer calls, they wanted to know who the customer was even before picking up the phone — effectively inspecting the caller ID, and connecting it with the easiest database editing facility I could write, so that they could give it a name and a freeform text box to write down addresses, notes, and preferences.

The reason why they called me to write this is that they originally bought a hardware PBX (for a single room pizzeria!) just so that a laptop could connect to it and use the Address Book functionality of the vendor. Except this functionality kept crashing, and after many weeks of back-and-forth with the headquarters in Japan, the integrator could not figure out how to get it to work.

As the pizzeria was wired with ISDN (legacy technology, heh), to be able to take at least two calls at the same time, the solution I came up with was building a simple “industrial” PC, with an ISDN line card and Asterisk, get them a standard SIP phone, and write the “CRM” so that it would initiate a SIP connection to the same Asterisk server (but never answer it). Once an inbound call arrived, it would look up if there was an entry in a simple storage layer for the phone number and display it with very large fonts, to be easily readable while around the kitchen.

As things moved and changed, a second pizzeria was opened and it required a similar setup. Except that, as ISDN are legacy technology, the provider was going to charge up to the nose for connecting a new line. Instead we decided to set up a VoIP account instead, and instead of a PC to connect the software, Asterisk ran on a server (in close proximity to the VoIP provider). And since at that point the limitation of an ISDN line on open calls is limited, the scope of the project expanded.

First of all, up to four calls could be queued, “your call is very important to us”-style. We briefly discussed allowing for reserving a spot and calling back, but at the time calls to mobile phones would still be expensive enough they wanted to avoid that. Instead the calls would get a simple message telling them to wait in line to contact the pizzeria. The CRM started showing the length of the queue (in a very clunky way), although it never showed the “next call” like the customer wanted (the relationship between the customer and the VoIP provider went South, and all of us had to end up withdrawing from the engagement).

Another feature we ended up implementing was opening hours: when call would arrive outside of the advertised opening hours, an announcement would play (recorded by a paid friend, who used to act in theatre, and thus had a good pronunciation).

I’m fairly sure that none of this would actually comply with the new GDPR requirements. At the very least, the customers should be advised that their data (phone number, address) will be saved.

But why am I talking about this in the context of Open Source software? Well, while a lot of the components used in this set up were open source, or even Free Software, it still requires a lot of integration to become usable. There’s no “turnkey pizzeria setup” — you can build up the system from components, but you need not just an integrator, you need a full developer (or development team) to make sure all the components fit together.

I honestly wish I had opensourced more of this. If I was to design this again right now, I would probably make sure that there was a direct, real-time API between Asterisk and a Web-based CRM. It would definitely make it easier to secure the data for GDPR compliance. But there is more than just that: having an actual integrated, isolated system where you can make configuration changes give the user (customer) the ability to set up things without having to know how the configuration files are structured.

To set up the Asterisk, it took me a week or two reading through documentation, books on the topic, and a significant amount of experimentation with a VoIP number and a battery of testing SIM cards at home. To make the recordings work I had to fight with converting the files to G.729 beforehand, or the playback would use a significant amount of CPU.

But these are not unknown needs. There are plenty of restaurants (who don’t have to be pizza places) out there that probably need something like this. And indeed services such as Deliveroo appear to now provide a similar all-in-one solution… which is good for restaurants in cities big enough to sustain Deliveroo, but probably not grate for the smaller restaurants in smaller cities, who probably would not have much of a chance of hiring developers to make such a system themselves.

So, rambling asides, I really wish we had more ready-to-install Open Source solutions for businesses (restaurants, hotels, … — I would like to add banks to that but I know regulatory compliance is hard). I think these would actually have a very good social impact on all those towns and cities that don’t have a critical mass of tech influence, that they come with their own collection of mobile apps, for instance.

If you’re the kind of person who complains that startups only appear to want to solve problems in San Francisco, maybe think of what problems you can solve in and around your town or city.

May 03, 2018
Michał Górny a.k.a. mgorny (homepage, bugs)
The ultimate guide to EAPI 7 (May 03, 2018, 07:25 UTC)

Back when EAPI 6 was approved and ready for deployment, I have written a blog post entitled the Ultimate Guide to EAPI 6. Now that EAPI 7 is ready, it is time to publish a similar guide to it.

Of all EAPIs approved so far, EAPI 7 brings the largest number of changes. It follows the path established by EAPI 6. It focuses on integrating features that are either commonly used or that can not be properly implemented in eclasses, and removing those that are either deemed unnecessary or too complex to support. However, the circumstances of its creation are entirely different.

EAPI 6 was more like a minor release. It was formed around the time when Portage development has been practically stalled. It aimed to collect some old requests into an EAPI that would be easy to implement by people with little knowledge of Portage codebase. Therefore, the majority of features oscillated around bash parts of the package manager.

EAPI 7 is closer to a proper major release. It included some explicit planning ahead of specification, and the specification has been mostly completed even before the implementation work started. We did not initially skip features that were hard to implement, even though the hardest of them were eventually postponed.

I will attempt to explain all the changes in EAPI 7 in this guide, including the rationale and ebuild code examples.

Continue reading

April 25, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
Some of my thoughts on comments in general (April 25, 2018, 11:04 UTC)

One of the points that is the hardest for me to make when I talk to people about my blog is how important comments are for me. I don’t mean comments in source code as documentation, but comments on the posts themselves.

You may remember that was one of the less appealing compromises I made when I moved to Hugo was accepting to host the comments on Disqus. A few people complained when I did that because Disqus is a vendor lock-in. That’s true in more ways than one may imagine.

It’s not just that you are tied into a platform with difficulty of moving out of it — it’s that there is no way to move out of it, as it is. Disqus does provide you the ability to download a copy of all the comments from your site, but they don’t guarantee that’s going to be available: if you have too many, they may just refuse to let you download them.

And even if you manage to download the comments, you’ll have fun time trying to do anything useful with them: Disqus does not let you re-import them, say in a different account, as they explicitly don’t allow that format to be imported. Nor does WordPress: when I moved my blog I had to hack up a script that took the Disqus export format, a WRX dump of the blog (which is just a beefed up RSS feed), and produces a third file, attaching the Disqus comments to the WRX as WordPress would have exported them. This was tricky, but it resolved the problem, and now all the comments are on the WordPress platform, allowing me to move them as needed.

Many people pointed out that there are at least a couple of open-source replacements for Disqus — but when I looked into them I was seriously afraid they wouldn’t really scale that well for my blog. Even WordPress itself appears sometimes not to know how to deal with a >2400 entries blog. The WRX file is, by itself, bigger than the maximum accepted by the native WordPress import tool — luckily the Automattic service has higher limits instead.

One of the other advantages of having moved away from Disqus is that the comments render without needing any JavaScript or third party service, make them searchable by search engines, and most importantly, preserves them in the Internet Archive!

But Disqus is not the only thing that disappoints me. I have a personal dislike for the design, and business model, of Hacker News and Reddit. It may be a bit of a situation of “old man yells at cloud”, but I find that these two websites, much more than Facebook, LinkedIn and other social media, are designed to take the conversation away from the authors.

Let me explain with an example. When I posted about Telegram and IPv6 last year, the post was sent to reddit, which I found because I have a self-stalking recipe for IFTTT that informs me if any link to my sites get posted there. And people commented on that — some missing the point and some providing useful information.

But if you read my blog post you won’t know about that at all, because the comments are locked into Reddit, and if Reddit were to disappear the day after tomorrow there won’t be any history of those comments at all. And this is without going into the issue of the “karma” going to the reposter (who I know in this case), rather than the author — who’s actually discouraged in most communities from submitting their own writings!

This applies in the same or similar fashion to other websites, such as Hacker News, Slashdot, and… is Digg still around? I lost track.

I also find that moving the comments off-post makes people nastier: instead of asking questions ready to understand and talk things through with the author, they assume the post exist in isolation, and that the author knows nothing of what they are talking about. And I’m sure that at least a good chunk of that is because they don’t expect the author to be reading them — they know full well they are “talking behind their back”.

I have had the pleasure to meet a lot of people on the Internet over time, mostly through comments on my or other blogs. I have learnt new thing and been given suggestions, solutions, or simply new ideas of what to poke at. I treasure the comments and the conversation they foster. I hope that we’ll have more rather than fewer of them in the future.

April 21, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
Mobile Web, Internet of Things, and the Geeks (April 21, 2018, 08:04 UTC)

I’m a geek. It’s not just the domain I used to use, but it’s a the truth at the core of myself. I’m also a gadgeteer: if there’s a new gadget that may do something I’m interested in, and I can afford it, I’ll have it (sometimes even if I can barely afford it). I love “toys” and novelties, and I don’t mind if they are a bit on the rough side, “some assembly required”.

All of this, though, is sometimes hard to reconcile with the absolute vitriol I see online, among the communities that include geeks, free software activists, privacy activists and so on.

I sometimes still hear a lot of people complaining about websites optimizing for mobile, sometimes to the disadvantage of 32″ 4K HiDPI monitors — despite the fact that the latter are definitely in a minority of use cases, while the former is the new reality of web access. I do understand that sometimes it’s bothersome just how messy some websites become when they decide to focus primarily on mobile, but there are plenty of cases in which a “mobile-first” point of view is just what people are more likely to need, and ignoring this can be actively harmful.

Let me try to build up an example, which may sound a bit contrived but I would expect to be very realistic.

As you now know, I now live in London, and things here are different than in Ireland. In particular, I can no longer just drop by the pharmacy every other week and go “Just refill me in on this stuff please”. Instead I need to order the prescription to the pharmacy by going online, to the portal of the service provider my surgery contracted, and fill in the form. Then I need to note down when the stuff will be available and go pick it up.

The service provider that my surgery is using did not do a particularly good job in the UI/UX of their product. The website is definitely not mobile optimised, it does not integrate with anything and does not send email reminders for anything, let alone ICS attachments. And when I spoke about that with friends and colleagues, reactions were mixed between the «Why would they spend time on mobile? It’s just fancy stuff» and «Only geeks would care about receiving ICS attachments».

I disagree because the thing is, I can definitely see myself taking the last pills from the blister while on vacation and remembering I need to order more — but I probably don’t have my computer at hand. Being able to just go on the mobile website (or app) and ordering them on the fly can easily be a lifesaver, particularly for people who don’t usually travel with their laptop at all.

And similarly, if I were to ask people about the ICS attachments themselves they would probably wonder what the heck am I talking about, but ask people if they’d appreciate their calendar to show when they are meant to pick up their prescription, or when they have an appointment with their GP, and they probably would go “Yes, please!”

Let me take another example: the Internet of Things. Of course it’s a buzzword, nowadays, but it does not come out of nowhere. The concept of home automation (which in Italian actually takes the word “domotica” for well over 20 years) is not new and it’s not just a matter of being the trend of the year.

While there indeed are a number of “connected things” ideas that make me raise eyebrow or frown on “what the heck were they thinking?”, dissing the ideas tout-court just because they are, well, “connected things” is, in my opinion, short sighted.

I don’t remember if it was Samsung, LG, or whoever else, that proposed first on the market a fridge with an Internet-connected webcam, so that you can check on what you have inside. I heard people complain that it’s just a gimmick and for the lazy — but I could definitely see myself using it. See something on sale at the supermarket, which you didn’t put on the list? Do you remember if you have enough space to put it in the fridge, or if it would be wasted?

Plenty of the solutions that relate around Internet of Things are indeed easy to disavow as “lazy” – I would love to have a washing machine that could be started while I’m the bus because I forgot to do so before leaving the apartment – but at the same time, they are very valuable for people who do have real problems with remembering about things later on. It does not strictly have to be available from the phone in the middle of London — if my phone could, once I get home, remind me “The dishwasher is done. The washing machine is done. You’re out of milk. You need to take out the trash”, that would make my day.

But instead of saying “Hey folks, we need better, safer products!”, I see lots of geeks just going “That’s the Internet of Shit for you, why would you want your coffee machine connected to the Internet?” — like this was never dreamed of by geeks. Or insisting that, since “Internet of Things” is a marketing term, it is cursed and everything that relates to it is “ungeek”.

From my point of view, a lot of these people are those that are now looking down on iPhone users, but were sending email instead of text messages back when you had to use WAP to access anything mobile.

Stop blaming the users. Accept that you may not like or have a need for something but someone else might want it anyway. And if you really want to help, start figuring out how we can make things more secure by default instead of making fun of those that get burnt by the latest vulnerability.

April 18, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
Updates on Silicon Labs CP2110 (April 18, 2018, 23:04 UTC)

One month ago I started the yak shave of supporting the Silicon Labs CP2110 with a fully opensource stack, that I can even re-use for glucometerutils.

The first step was deciding how to implement this. While the device itself supports quite a wide range of interfaces, including a GPIO one, I decided that since I’m only going to be able to test and use practically the serial interface, I would at least start with just that. So you’ll probably see the first output as a module for pyserial that implements access to CP2110 devices.

The second step was to find an easy way to test this in a more generic way. Thankfully, Martin Holzhauer, who commented on the original post, linked to an adapter by MakerSpot that uses that chip (the link to the product was lost in the migration to WordPress, sigh), which I then ordered and received a number of weeks later, since it had to come to the US and clear customs through Amazon.

All of this was the easy part, the next part was actually implementing enough of the protocol described in the specification, so that I could actually send and receive data — and that also made it clear that despite the protocol being documented, it’s not as obvious as it might sound — for instance, the specification says that the reports 0x01 to 0x3F are used to send and receive data, but it does not say why there are so many reports… except that it turns out they are actually used to specify the length of the buffer: if you send two bytes, you’ll have to use the 0x02 report, for ten bytes 0x0A, and so on, until the maximum of 63 bytes as 0x3F. This became very clear when I tried sending a long string and the output was impossible to decode.

Speaking of decoding, my original intention was to just loop together the CP2110 device with a CH341 I bought a few years ago, and have them loop data among each other to validate that they work. Somehow this plan failed: I can get data from the CH341 into the CP2110 and it decodes fine (using picocom for the CH341, and Silicon Labs own binary for the CP2110), but I can’t seem to get the CH341 to pick up the data sent through the CP2110. I thought it was a bad adapter, but then I connected the output to my Saleae Logic16 and it showed the data fine, so… no idea.

The current status is:

  • I know the CH341 sends out a good signal;
  • I know the CP2110 can receive a good signal from the CH341, with the Silicon Labs software;
  • I know the CP2110 can send a good signal to the Saleae Logic16, both with the Silicon Labs software and my tiny script;
  • I can’t get the CH341 to receive data from the CP2110.

Right now the state is still very much up in the air, and since I’ll be travelling quite a bit without a chance to bring with me the devices, there probably won’t be any news about this for another month or two.

Oh and before I forget, Rich Felker gave me another interesting idea: CUSE (Character Devices in User Space) is a kernel-supported way to “emulate” in user space devices that would usually be implemented in the kernel. And that would be another perfect application for this: if you just need to use a CP2110 as an adapter for something that needs to speak with a serial port, then you can just have a userspace daemon that implements CUSE, and provide a ttyUSB-compatible device, while not requiring short-circuiting the HID and USB-Serial subsystems.

Zack Medico a.k.a. zmedico (homepage, bugs)

In portage-2.3.30, portage’s python API provides an asyncio event loop policy via a DefaultEventLoopPolicy class. For example, here’s a little program that uses portage’s DefaultEventLoopPolicy to do the same thing as emerge --regen, using an async_iter_completed function to implement the --jobs and --load-average options:

#!/usr/bin/env python

from __future__ import print_function

import argparse
import functools
import multiprocessing
import operator

import portage
from portage.util.futures.iter_completed import (
    async_iter_completed,
)
from portage.util.futures.unix_events import (
    DefaultEventLoopPolicy,
)


def handle_result(cpv, future):
    metadata = dict(zip(portage.auxdbkeys, future.result()))
    print(cpv)
    for k, v in sorted(metadata.items(),
        key=operator.itemgetter(0)):
        if v:
            print('\t{}: {}'.format(k, v))
    print()


def future_generator(repo_location, loop=None):

    portdb = portage.portdb

    for cp in portdb.cp_all(trees=[repo_location]):
        for cpv in portdb.cp_list(cp, mytree=repo_location):
            future = portdb.async_aux_get(
                cpv,
                portage.auxdbkeys,
                mytree=repo_location,
                loop=loop,
            )

            future.add_done_callback(
                functools.partial(handle_result, cpv))

            yield future


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--repo',
        action='store',
        default='gentoo',
    )
    parser.add_argument(
        '--jobs',
        action='store',
        type=int,
        default=multiprocessing.cpu_count(),
    )
    parser.add_argument(
        '--load-average',
        action='store',
        type=float,
        default=multiprocessing.cpu_count(),
    )
    args = parser.parse_args()

    try:
        repo_location = portage.settings.repositories.\
            get_location_for_name(args.repo)
    except KeyError:
        parser.error('unknown repo: {}\navailable repos: {}'.\
            format(args.repo, ' '.join(sorted(
            repo.name for repo in
            portage.settings.repositories))))

    policy = DefaultEventLoopPolicy()
    loop = policy.get_event_loop()

    try:
        for future_done_set in async_iter_completed(
            future_generator(repo_location, loop=loop),
            max_jobs=args.jobs,
            max_load=args.load_average,
            loop=loop):
            loop.run_until_complete(future_done_set)
    finally:
        loop.close()



if __name__ == '__main__':
    main()

April 15, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
A review of the Curve debit card (April 15, 2018, 23:04 UTC)

Somehow, I end up spending a significant amount of my time thinking, testing and playing with financial services, both old school banks and fintech startups.

One of the most recent ones I have been playing with is Curve. The premise of the service is to allow you to use a single card for all transactions, having the ability to then charge any card underneath it as it is convenient. This was a curious enough idea, so I asked the friend who was telling me about it to give me his referral code to sign up. If you want to sign up, my code is BG2G3.

Signing up and getting the card is quite easy, even though they have (or had when I signed up) a “waitlist” — so after you sign up it takes a few days for you to be able to actually order the card and get it in your hands. They suggest you to make other people sign up for it as well to lower the time in the waitlist, but that didn’t seem to be a requirement for me. The card arrived, to my recalling, no more than two days after they said they shipped it, probably because it was coming from London itself, and that’s all you need to receive.

So how does this all work? You need to connect your existing cards to Curve, and verify them to be able to charge them — verification can be either through a 3Dsecure/Verified by Visa login, or through the usual charge-code-reverse dance that Google, PayPal and the others all use. Once you connect the cards, and select the currently-charged card, you can start using the Curve card to pay, and it acts as a “proxy” for the other card, charging it for the same amount, with some caveats.

The main advantage that my friend suggested for this set up is that you if you have a corporate card (I do), you can add that one to Curve too, and rely on that to not have to do the payback process at work if you make a mistake paying for something. As this happened to me a few times before, mostly out of selecting the wrong payment profile in apps such as Uber or Hailo, or going over the daily allowance for meals as I changed plans, it sounded interesting enough. This can work both by making sure to select the corporate card before making the purchase (for instance by defaulting to it during a work trip), or by “turning back time” on an already charged transaction. Cool.

I also had a hope that the card could be attached to Google Pay, but that’s not the case. Nor they implement their own NFC payment application, which is a bit disappointing.

Beside the “turn back time” feature, the app also has some additional features, such as the integration with the accounting software Xero, including the ability to attach a receipt image to the expense (if this was present for Concur, I’d be a real believer, but until then it’s not really that useful to me), and to receive email “receipts” (more like credit card slips) for purchases made to a certain card (not sure why that’s not just a global, but meh).

Not much else is available in the app to make it particularly useful or interesting to me, honestly. There’s some category system for expenses, very similar to the one for Revolut, but that’s about it.

On the more practical side of things, Curve does not apply any surcharges as long as the transaction is in the same currency as the card, and that includes the transactions in which you turned back time. Which is handy if you don’t know what the currency you’ll be charged in will be in, though that does not really happen often.

What I found particularly useful for this is that the card itself look like a proper “British” card — with my apartment as the verified address on it. But then I can charge one of my cards in Euro, US Dollars, or Revolut itself… although I could just charge Revolut in those cases. The main difference between the two approach is that I can put the Euro purchases onto an Euro credit card, instead of a debit one… except that the only Euro credit card I’m left also has my apartment as its verifiable billing address, so… I’d say I’m not the target audience for this feature.

For “foreign transactions” (that is, where the charged currency and the card currency disagree), Curve charges a 1% foreign transaction fee. This is pointless for me thanks to Revolut, but it still is convenient if you only have accounts with traditional banks, particularly in the UK where most banks apply a 3% foreign transaction fee instead.

In addition to the free card, they also offer a £50 (a year, I guess — it’s not clear!) “black” card that offers 1% cashback at selected retailers. You can actually get 90 days cashback for three retailers of your choice on the free card as well, but to be honest, American Express is much more widely accepted in chains, and its rewards are better. I ended up choosing to do the cashback with TfL, Costa (because they don’t take Amex contactless anyway), and Sainsbury’s just to select something I use.

In all of this, I have to say I fail to see where the business makes money. Okay, financial services are not my area of expertise, but if you’re just proxying payments, without even taking deposits (the way Revolut does), and not charging additional foreign transaction fees, and even giving cashback… where’s the profit?

I guess there is some money to be made by profiling users and selling the data to advertisers — but between GDPR and the fact that most consumers don’t like the idea of being made into products with no “kick back”. I guess if it was me I would be charging 1% on the “turn back time” feature, but that might make moot the whole point of the service. I don’t know.

At the end of the day, I’m also not sure how useful this card is going to be for me, on the day to day. The ability to have a single entry in those systems that are used “promiscuously” for business and personal usage sounds good, but even in that case, it means ignoring the advantages of having a credit card, and particularly a rewards card like my Amex. So, yeah, don’t really see much use for it myself.

It also complicates things when it comes to risk engines for fraud protection: your actual bank will see all the transactions as coming from a single vendor, with the minimum amount of information attached to it. This will likely defeat all the fraud checks by the bank, and will likely rely on Curve’s implementation of fraud checks — which I have no idea how they work, since I have not yet managed to catch them.

Also as far as I could tell, Curve (like Revolut) does not implement 3DSecure (the “second factor” authentication used by a number of merchants to validate e-commerce transactions), making it less secure than any of the other cards I have — a cloned/stolen card can only be disabled after the fact, and replaced. Revolut at least allows me to separate the physical card from my e-commerce transactions, which is neat (and now even supports one time credit card numbers).

There is also another thing that is worth considering, that shows the different point of views (and threat models) of the two services: Curve becomes your single card (single point of failure, too) for all your activities: it makes your life easy by making sure you only ever need to use one card, even if you have multiple bank accounts in multiple countries, and you can switch between them at the tap of a finger. Revolut on the other hand allows you to give each merchant its own credit card number (Premium accounts get unlimited virtual cards) — or even have “burner” cards that change numbers after use.

All in all, I guess it depends what you want to achieve. Between the two, I think I’ll vastly stick to Revolut, and my usage of Curve will taper off once the 90 days cashback offer is completed — although it’s still nice to have for a few websites that gave me trouble with Revolut, as long as I’m charging my Revolut account again, and the charge is in Sterling.

If you do want to sign up, feel free to use BG2G3 as the referral code, it would give a £5 credit for both you and me, under circumstances that are not quite clear to me, but who knows.

April 14, 2018
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
crossdev and GNU Hurd (April 14, 2018, 00:00 UTC)

trofi's blog: crossdev and GNU Hurd

crossdev and GNU Hurd

Tl;DR: crossdev is a tool to generate a cross-compiler for you in gentoo and with some hacks (see below) you can even cross-compile to hurd!

FOSDEM 2018 conference happened recently and a lot of cool talks tool place there. The full list counts 689 events!

Hurd’s PCI arbiter was a nice one. I never actually tried hurd before and thought to give it a try in a VM.

Debian already provides full hurd installer (installation manual) and I picked it. Hurd works surprisingly well for a such an understaffed project! Installation process is very simple: it’s a typical debian CD which asks you for a few details about final system (same as for linux) and you get your OS booted.

Hurd has a ton of debian software already built and working (like 80% of the whole repo). Even GHC is ported there. While at it I grabbed all the tiny GHC patches related to hurd from Debian and pushed them upstream:

Now plain ./configure && make && make test just works.

Hurd supports only 32-bit x86 CPUs and does not support SMP (only one CPU is available). That makes building heaviweight stuff (like GHC) in a virtual machine a slow process.

To speed things up a bit I decided to build a cross-compiler from gentoo linux to hurd with the help of crossdev. What does it take to support bootstrap like that? Let’s see!

To get the idea of how to cross-compile to another OS let’s check how typical linux to linux case looks like.

Normally aiming gcc at another linux-glibc target takes the following steps:

- install cross-binutils
- install system headers (kernel headers and glibc headers):
- install minimal gcc without glibc support (not able to link final executables yet)
- install complete glibc (gcc will need crt.o files)
- install full gcc (able to link final binaries for C and C++)

In gentoo crossdev does all the above automatically by running emerge a few times for you. I wrote a more up-to-date crossdev README to describe a few details of what is happening when you run crossdev -t <target>.

hurd-glibc is not fundamentally different from linux-glibc case. Only a few packages need to change their names, namely:

- install cross-binutils
- install gnumach-headers (kernel headers part 1)
- [NEW] install cross-mig tool (Mach Interface Generator, a flavour of IDL compiler)
- install hurd-headers and glibc-headers (kernel headers part 2 and libc headers)
- install minimal gcc without libc support (not able to link final executables yet)
- install complete libc (gcc will need crt.o files)
- install full gcc (able to link final binaries for C and C++)

The only change from linux is the cross-mig tool. I’ve collected ebuilds needed in gentoo-hurd overlay.

Here is how one gets hurd cross-compiler with today’s crossdev-99999999:

git clone https://github.com/trofi/gentoo-hurd.git
HURD_OVERLAY=$(pwd)/gentoo-hurd
CROSS_OVERLAY=$(portageq get_repo_path / crossdev)
TARGET_TUPLE=i586-pc-gnu
# this will fail around glibc, it's ok we'll take on manually from there
crossdev --l 9999 -t crossdev -t ${TARGET_TUPLE}
ln -s "${HURD_OVERLAY}"/sys-kernel/gnumach ${CROSS_OVERLAY}/cross-${TARGET_TUPLE}/
ln -s "${HURD_OVERLAY}"/dev-util/mig ${CROSS_OVERLAY}/cross-${TARGET_TUPLE}/
ln -s "${HURD_OVERLAY}"/sys-kernel/hurd ${CROSS_OVERLAY}/cross-${TARGET_TUPLE}/
emerge -C cross-${TARGET_TUPLE}/linux-headers
ACCEPT_KEYWORDS='**' USE=headers-only emerge -v1 cross-${TARGET_TUPLE}/gnumach
ACCEPT_KEYWORDS='**' USE=headers-only emerge -v1 cross-${TARGET_TUPLE}/mig
ACCEPT_KEYWORDS='**' USE=headers-only emerge -v1 cross-${TARGET_TUPLE}/hurd
ACCEPT_KEYWORDS='**' USE=headers-only emerge -v1 cross-${TARGET_TUPLE}/glibc
USE="-*" emerge -v1 cross-${TARGET_TUPLE}/gcc
ACCEPT_KEYWORDS='**' USE=-headers-only emerge -v1 cross-${TARGET_TUPLE}/glibc
USE="-sanitize" emerge -v1 cross-${TARGET_TUPLE}/gcc

Done!

A few things to note here:

  • gentoo-hurd overlay is used for new gnumach, mig and hurd packages
  • symlinks to new packages are created manually (crossdev fix pending, wait on packages to get into ::gentoo)
  • uninstall linux-headers as our target is not linux (crossdev fix pending)
  • use ACCEPT_KEYWORDS=’**’ for many packages (need to decide on final keywords, maybe x86-hurd)
  • all crossdev phases are ran manually
  • only glibc git version is working as changes were merged upstream very recently.
  • USE=“sanitize” is disabled in final gcc because it’s broken for hurd

Now you can go to /usr/${TARGET_TUPLE}/etc/portage/ and tweak the defaults for ELIBC, KERNEL and other things.

Basic sanity check for a toolchain:

$ i586-pc-gnu-gcc main.c -o main
$ file main
main: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld.so, for GNU/Hurd 0.0.0, with debug_info, not stripped

Copying to the target hurd VM and runnig there also works as expected.

I use crossdev -t x86_64-HEAD-linux-gnu to have GHC built against HEAD in parallel to system’s GHC. Let’s use that for more heavyweight test to build a GHC cross-compiler to hurd:

$ EXTRA_ECONF=--with-ghc=x86_64-HEAD-linux-gnu-ghc emerge -v1 cross-i586-pc-gnu/ghc --quiet-build=n

This fails as:

rts/posix/Signals.c:398:28: error:
     note: each undeclared identifier is reported only once for each function it appears in
    |
398 |         action.sa_flags |= SA_SIGINFO;
    |                            ^

Which hints at lack of SA_SIGINFO support in upstream glibc.git. Debian as an out-of-tree tg-hurdsig-SA_SIGINFO.diff patch to provide these defines (as least it’s not our local toolchain breakage). The outcome is positive: we have got very far into cross-compiling and hit real portability issues. Woohoo!

Final words

As long as underlying toolchains are not too complicated building cross-compilers in gentoo is trivial. Next tiny step is to cross-build hurd kernel itself and run it in qemu. Ebuilds in gentoo-hurd are not yet ready for it but tweaking them should be easy.

Have fun!

Posted on April 14, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

April 11, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

Today's news is that we have submitted a manuscript for publication, describing Lab::Measurement and with it our approach towards fast, flexible, and platform-independent measuring with Perl! The manuscript mainly focuses on the new, Moose-based class hierarchy. We have uploaded it to arXiv as well; here is the (for now) full bibliographic information of the preprint:

 "Lab::Measurement - a portable and extensible framework for controlling lab equipment and conducting measurements"
S. Reinhardt, C. Butschkow, S. Geissler, A. Dirnaichner, F. Olbrich, C. Lane, D. Schröer, and A. K. Hüttel
submitted for publication; arXiv:1804.03321 (PDF, HTML, BibTeX entry)
If you're using Lab::Measurement in your lab, and this results in some nice publication, then we'd be very grateful for a citation of our work - for now the preprint, and later hopefully the accepted version.

Hanno Böck a.k.a. hanno (homepage, bugs)

SnallygasterA few days ago I figured out that several blogs operated by T-Mobile Austria had a Git repository exposed which included their wordpress configuration file. Due to the fact that a phpMyAdmin installation was also accessible this would have allowed me to change or delete their database and subsequently take over their blogs.

Git Repositories, Private Keys, Core Dumps

Last year I discovered that the German postal service exposed a database with 200.000 addresses on their webpage, because it was simply named dump.sql (which is the default filename for database exports in the documentation example of mysql). An Australian online pharmacy exposed a database under the filename xaa, which is the output of the "split" tool on Unix systems.

It also turns out that plenty of people store their private keys for TLS certificates on their servers - or their SSH keys. Crashing web applications can leave behind coredumps that may expose application memory.

For a while now I became interested in this class of surprisingly trivial vulnerabilities: People leave files accessible on their web servers that shouldn't be public. I've given talks at a couple of conferences (recordings available from Bornhack, SEC-T, Driving IT). I scanned for these issues with a python script that extended with more and more such checks.

Scan your Web Pages with snallygaster

It's taken a bit longer than intended, but I finally released it: It's called Snallygaster and is available on Github and PyPi.

Apart from many checks for secret files it also contains some checks for related issues like checking invalid src references which can lead to Domain takeover vulnerabilities, for the Optionsleed vulnerability which I discovered during this work and for a couple of other vulnerabilities I found interesting and easily testable.

Some may ask why I wrote my own tool instead of extending an existing project. I thought about it, but I didn't really find any existing free software vulnerability scanner that I found suitable. The tool that comes closest is probably Nikto, but testing it I felt it comes with a lot of checks - thus it's slow - and few results. I wanted a tool with a relatively high impact that doesn't take forever to run. Another commonly mentioned free vulnerability scanner is OpenVAS - a fork from Nessus back when that was free software - but I found that always very annoying to use and overengineered. It's not a tool you can "just run". So I ended up creating my own tool.

A Dragon Legend in US Maryland

Finally you may wonder what the name means. The Snallygaster is a dragon that according to some legends was seen in Maryland and other parts of the US. Why that name? There's no particular reason, I just searched for a suitable name, I thought a mythical creature may make a good name. So I searched Wikipedia for potential names and checked for name collisions. This one had none and also sounded funny and interesting enough.

I hope snallygaster turns out to be useful for administrators and pentesters and helps exposing this class of trivial, but often powerful, vulnerabilities. Obviously I welcome new ideas of further tests that could be added to snallygaster.

April 09, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
Tree pic ;) (April 09, 2018, 09:32 UTC)

April 03, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
py3status v3.8 (April 03, 2018, 12:06 UTC)

Another long awaited release has come true thanks to our community!

The changelog is so huge that I had to open an issue and cry for help to make it happen… thanks again @lasers for stepping up once again 🙂

Highlights

  • gevent support (-g option) to switch from threads scheduling to greenlets and reduce resources consumption
  • environment variables support in i3status.conf to remove sensible information from your config
  • modules can now leverage a persistent data store
  • hundreds of improvements for various modules
  • we now have an official debian package
  • we reached 500 stars on github #vanity

Milestone 3.9

  • try to release a version faster than every 4 months (j/k) 😉

The next release will focus on bugs and modules improvements / standardization.

Thanks contributors!

This release is their work, thanks a lot guys!

  • alex o’neill
  • anubiann00b
  • cypher1
  • daniel foerster
  • daniel schaefer
  • girst
  • igor grebenkov
  • james curtis
  • lasers
  • maxim baz
  • nollain
  • raspbeguy
  • regnat
  • robert ricci
  • sébastien delafond
  • themistokle benetatos
  • tobes
  • woland

April 02, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)

Thanks to the enlightenment devs for fixing this ;) no lock screen sucks :D

https://www.enlightenment.org/news/e0.22.3_release

also it is in my Gentoo dev overlay as of now.

So a while ago I cleaned out my dev overlay and added dev-libs/efl-1.20.7 and x11-wm/enlightenment-0.22.1 (and 0.22.2)

Works for me at the moment (except the screen (un-)lock) but not sure if that has to do with my box. any testers welcome

Here's the link: https://gitweb.gentoo.org/dev/lordvan.git/

Oh and I added it to layman's repo list again, so gentoo users can easily just "layman -a lordvan" to test it.

On a side note: 0.22.1 gave me trouble with a 2nd screen plugged in, which seems fixed in 0.22.2, but that has (pam related) problems with the lock screen ..

March 30, 2018
Sebastian Pipping a.k.a. sping (homepage, bugs)

I recently dockerized a small Django application. I build the Dockerfile in a way that the resulting image would allow running the container as if it was plain manage.py, e.g. that besides docker-compose up I could also do:

# For a psql session into the database:
docker-compose run <image_name> dbshell

# Or, to run the test suite:
docker-compose run <image_name> test

To make that work, I made this Docker entrypoint script:

#! /bin/bash
# Copyright (C) 2018 Sebastian Pipping <sebastian@pipping.org>
# Licensed under CC0 1.0 Public Domain Dedication.
# https://creativecommons.org/publicdomain/zero/1.0/

set -e
set -u

RUN() {
    ( PS4='# ' && set -x && "$@" )
}

RUN wait-for-it "${POSTGRES_HOST}:${POSTGRES_PORT}" -t 30

cd /app

if [[ $# -gt 0 ]]; then
    RUN ./manage.py "$@"
else
    RUN ./manage.py makemigrations
    RUN ./manage.py migrate
    RUN ./manage.py createcustomsuperuser  # self-made

    RUN ./manage.py runserver 0.0.0.0:${APP_PORT}
fi

Management command createcustomsuperuser is something simple that I built myself for this very purpose: Create a super user, support scripting, accept a passwords as bad as "password" or "demo" without complaints, and be okay if the user exists with the same credentials already (idempotency). I uploaded createcustomsuperuser.py as a Gist to GitHub as it's a few lines more. Back to the entrypoint script. For the RUN ./manage.py "$@" part to work, in the Dockerfile both ENTRYPOINT and CMD need to use the [..] syntax, e.g.:

ENTRYPOINT ["/app/docker-entrypoint.sh"]
CMD []

For more details on ENTRYPOINT quirks like that I recommend John Zaccone's well-written article "ENTRYPOINT vs CMD: Back to Basics".

March 18, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)

https://www.labmeasurement.de/images/2018-03-simon-berlin.jpg
The 2018 spring meeting of the DPG condensed matter physics section in Berlin is over, and we've all listened to interesting talks and seen exciting physics. And we've also presented the Lab::Measurement poster! Here's the photo proof of Simon explaining our software... click on the image or the link for a larger version!

March 17, 2018
Sebastian Pipping a.k.a. sping (homepage, bugs)
Holy cow! Larry the cow Gentoo tattoo (March 17, 2018, 14:53 UTC)

Probably not new but was new to me: Just ran into this Larry the Cow tattoo online: http://www.geekytattoos.com/larry-the-gender-challenged-cow/

March 12, 2018
Thomas Raschbacher a.k.a. lordvan (homepage, bugs)
Sunshine blend (March 12, 2018, 14:23 UTC)

The 2nd one is also from Irvin's house of flavour: the sunshine blend.

See the label on the picture for details ;)

This one has a similar colour than the sen cha cherry, but not just the same (not sure if that is because it is chinese sen cha vs japanese, or the rest of the blend).

As with most sen cha blends the flavour is more prominent in the smell than the taste itself, but this one too tastes great. If you read the ingredients, you can guess, why I choose this one to drink when I am just recovering from a (week long) cold.

Japanese Sen Cha with Cherry (March 12, 2018, 14:19 UTC)

The first one shall be a long time favourite of mine: Japanese Sen Cha with Cherry.

As most of my teas this one is from Irvin's House of Flavour (Wellingborough, Northants, UK).

It is one of the blends I sometimes take to training with me (in a flask).

The tea gets a lovely colour like most of my sen cha blends, and the smell is very pleasant too. It's not a very strong flavour, but definitely there and it keeps making me get more when I am about to run out ;)

March 11, 2018
Greg KH a.k.a. gregkh (homepage, bugs)

As many people know, last week there was a court hearing in the Geniatech vs. McHardy case. This was a case brought claiming a license violation of the Linux kernel in Geniatech devices in the German court of OLG Cologne.

Harald Welte has written up a wonderful summary of the hearing, I strongly recommend that everyone go read that first.

In Harald’s summary, he refers to an affidavit that I provided to the court. Because the case was withdrawn by McHardy, my affidavit was not entered into the public record. I had always assumed that my affidavit would be made public, and since I have had a number of people ask me about what it contained, I figured it was good to just publish it for everyone to be able to see it.

There are some minor edits from what was exactly submitted to the court such as the side-by-side German translation of the English text, and some reformatting around some footnotes in the text, because I don’t know how to do that directly here, and they really were not all that relevant for anyone who reads this blog. Exhibit A is also not reproduced as it’s just a huge list of all of the kernel releases in which I felt that were no evidence of any contribution by Patrick McHardy.

AFFIDAVIT

I, the undersigned, Greg Kroah-Hartman,
declare in lieu of an oath and in the
knowledge that a wrong declaration in
lieu of an oath is punishable, to be
submitted before the Court:

I. With regard to me personally:

1. I have been an active contributor to
   the Linux Kernel since 1999.

2. Since February 1, 2012 I have been a
   Linux Foundation Fellow.  I am currently
   one of five Linux Foundation Fellows
   devoted to full time maintenance and
   advancement of Linux. In particular, I am
   the current Linux stable Kernel maintainer
   and manage the stable Kernel releases. I
   am also the maintainer for a variety of
   different subsystems that include USB,
   staging, driver core, tty, and sysfs,
   among others.

3. I have been a member of the Linux
   Technical Advisory Board since 2005.

4. I have authored two books on Linux Kernel
   development including Linux Kernel in a
   Nutshell (2006) and Linux Device Drivers
   (co-authored Third Edition in 2009.)

5. I have been a contributing editor to Linux
   Journal from 2003 - 2006.

6. I am a co-author of every Linux Kernel
   Development Report. The first report was
   based on my Ottawa Linux Symposium keynote
   in 2006, and the report has been published
   every few years since then. I have been
   one of the co-author on all of them. This
   report includes a periodic in-depth
   analysis of who is currently contributing
   to Linux. Because of this work, I have an
   in-depth knowledge of the various records
   of contributions that have been maintained
   over the course of the Linux Kernel
   project.

   For many years, Linus Torvalds compiled a
   list of contributors to the Linux kernel
   with each release. There are also usenet
   and email records of contributions made
   prior to 2005. In April of 2005, Linus
   Torvalds created a program now known as
   “Git” which is a version control system
   for tracking changes in computer files and
   coordinating work on those files among
   multiple people. Every Git directory on
   every computer contains an accurate
   repository with complete history and full
   version tracking abilities.  Every Git
   directory captures the identity of
   contributors.  Development of the Linux
   kernel has been tracked and managed using
   Git since April of 2005.

   One of the findings in the report is that
   since the 2.6.11 release in 2005, a total
   of 15,637 developers have contributed to
   the Linux Kernel.

7. I have been an advisor on the Cregit
   project and compared its results to other
   methods that have been used to identify
   contributors and contributions to the
   Linux Kernel, such as a tool known as “git
   blame” that is used by developers to
   identify contributions to a git repository
   such as the repositories used by the Linux
   Kernel project.

8. I have been shown documents related to
   court actions by Patrick McHardy to
   enforce copyright claims regarding the
   Linux Kernel. I have heard many people
   familiar with the court actions discuss
   the cases and the threats of injunction
   McHardy leverages to obtain financial
   settlements. I have not otherwise been
   involved in any of the previous court
   actions.

II. With regard to the facts:

1. The Linux Kernel project started in 1991
   with a release of code authored entirely
   by Linus Torvalds (who is also currently a
   Linux Foundation Fellow).  Since that time
   there have been a variety of ways in which
   contributions and contributors to the
   Linux Kernel have been tracked and
   identified. I am familiar with these
   records.

2. The first record of any contribution
   explicitly attributed to Patrick McHardy
   to the Linux kernel is April 23, 2002.
   McHardy’s last contribution to the Linux
   Kernel was made on November 24, 2015.

3. The Linux Kernel 2.5.12 was released by
   Linus Torvalds on April 30, 2002.

4. After review of the relevant records, I
   conclude that there is no evidence in the
   records that the Kernel community relies
   upon to identify contributions and
   contributors that Patrick McHardy made any
   code contributions to versions of the
   Linux Kernel earlier than 2.4.18 and
   2.5.12. Attached as Exhibit A is a list of
   Kernel releases which have no evidence in
   the relevant records of any contribution
   by Patrick McHardy.

March 03, 2018
Sven Vermeulen a.k.a. swift (homepage, bugs)
Automating compliance checks (March 03, 2018, 12:20 UTC)

With the configuration baseline for a technical service being described fully (see the first, second and third post in this series), it is time to consider the validation of the settings in an automated manner. The preferred method for this is to use Open Vulnerability and Assessment Language (OVAL), which is nowadays managed by the Center for Internet Security, abbreviated as CISecurity. Previously, OVAL was maintained and managed by Mitre under NIST supervision, and Google searches will often still point to the old sites. However, documentation is now maintained on CISecurity's github repositories.

But I digress...

Read-only compliance validation

One of the main ideas with OVAL is to have a language (XML-based) that represents state information (what something should be) which can be verified in a read-only fashion. Even more, from an operational perspective, it is very important that compliance checks do not alter anything, but only report.

Within its design, OVAL engineering has considered how to properly manage huge sets of assessment rules, and how to document this in an unambiguous manner. In the previous blog posts, ambiguity was resolved through writing style, and not much through actual, enforced definitions.

OVAL enforces this. You can't write a generic or ambiguous rule in OVAL. It is very specific, but that also means that it is daunting to implement the first few times. I've written many OVAL sets, and I still struggle with it (although that's because I don't do it enough in a short time-frame, and need to reread my own documentation regularly).

The capability to perform read-only validation with OVAL leads to a number of possible use cases. In the 5.10 specification a number of use cases are provided. Basically, it boils down to vulnerability discovery (is a system vulnerable or not), patch management (is the system patched accordingly or not), configuration management (are the settings according to the rules or not), inventory management (detect what is installed on the system or what the systems' assets are), malware and threat indicator (detect if a system has been compromised or particular malware is active), policy enforcement (verify if a client system adheres to particular rules before it is granted access to a network), change tracking (regularly validating the state of a system and keeping track of changes), and security information management (centralizing results of an entire organization or environment and doing standard analytics on it).

In this blog post series, I'm focusing on configuration management.

OVAL structure

Although the OVAL standard (just like the XCCDF standard actually) entails a number of major components, I'm going to focus on the OVAL definitions. Be aware though that the results of an OVAL scan are also standardized format, as are results of XCCDF scans for instance.

OVAL definitions have 4 to 5 blocks in them: - the definition itself, which describes what is being validated and how. It refers to one or more tests that are to be executed or validated for the definition result to be calculated - the test or tests, which are referred to by the definition. In each test, there is at least a reference to an object (what is being tested) and optionally to a state (what should the object look like) - the object, which is a unique representation of a resource or resources on the system (a file, a process, a mount point, a kernel parameter, etc.). Object definitions can refer to multiple resources, depending on the definition. - the state, which is a sort-of value mapping or validation that needs to be applied to an object to see if it is configured correctly - the variable, an optional definition which is what it sounds like, a variable that substitutes an abstract definition with an actual definition, allowing to write more reusable tests.

Let's get an example going, but without the XML structure, so in human language. We want to define that the Kerberos definition on a Linux system should allow forwardable tickets by default. This is accomplished by ensuring that, inside the /etc/krb5.conf file (which is an INI-style configuration file), the value of the forwardable key inside the [libdefaults] section is set to true.

In OVAL, the definition itself will document the above in human readable text, assign it a unique ID (like oval:com.example.oval:def:1) and mark it as being a definition for configuration validation (compliance). Then, it defines the criteria that need to be checked in order to properly validate if the rule is applicable or not. These criteria include validation if the OVAL statement is actually being run on a Linux system (as it makes no sense to run it against a Cisco router) which is Kerberos enabled, and then the criteria of the file check itself. Each criteria links to a test.

The test of the file itself links to an object and a state. There are a number of ways how we can check for this specific case. One is that the object is the forwardable key in the [libdefaults] section of the /etc/krb5.conf file, and the state is the value true. In this case, the state will point to those two entries (through their unique IDs) and define that the object must exist, and all matches must have a matching state. The "all matches" here is not that important, because there will generally only be one such definition in the /etc/krb5.conf file.

Note however that a different approach to the test can be declared as well. We could state that the object is the [libdefaults] section inside the /etc/krb5.conf file, and the state is the value true for the forwardable key. In this case, the test declares that multiple objects must exist, and (at least) one must match the state.

As you can see, the OVAL language tries to map definitions to unambiguous definitions. So, how does this look like in OVAL XML?

The OVAL XML structure

The full example contains a few more entries than those we declare next, in order to be complete. The most important definitions though are documented below.

Let's start with the definition. As stated, it will refer to tests that need to match for the definition to be valid.

<definitions>
  <definition id="oval:com.example.oval:def:1" version="1" class="compliance">
    <metadata>
      <title>libdefaults.forwardable in /etc/krb5.conf must be set to true</title>
      <affected family="unix">
        <platform>Red Hat Enterprise Linux 7</platform>
      </affected>
      <description>
        By default, tickets obtained from the Kerberos environment must be forwardable.
      </description>
    </metadata>
    <criteria operator="AND">
      <criterion test_ref="oval:com.example.oval:tst:1" comment="Red Hat Enterprise Linux is installed"/>
      <criterion test_ref="oval:com.example.oval:tst:2" comment="/etc/krb5.conf's libdefaults.forwardable is set to true"/>
    </criteria>
  </definition>
</definitions>

The first thing to keep in mind is the (weird) identification structure. Just like with XCCDF, it is not sufficient to have your own id convention. You need to start an id with oval: followed by the reverse domain definition (here com.example.oval), followed by the type (def for definition) and a sequence number.

Also, take a look at the criteria. Here, two tests need to be compliant (hence the AND operator). However, more complex operations can be done as well. It is even allowed to nest multiple criteria, and refer to previous definitions, like so (taken from the ssg-rhel6-oval.xml file:

<criteria comment="package hal removed or service haldaemon is not configured to start" operator="OR">
  <extend_definition comment="hal removed" definition_ref="oval:ssg:def:211"/>
  <criteria operator="AND" comment="service haldaemon is not configured to start">
    <criterion comment="haldaemon runlevel 0" test_ref="oval:ssg:tst:212"/>
    <criterion comment="haldaemon runlevel 1" test_ref="oval:ssg:tst:213"/>
    <criterion comment="haldaemon runlevel 2" test_ref="oval:ssg:tst:214"/>
    <criterion comment="haldaemon runlevel 3" test_ref="oval:ssg:tst:215"/>
    <criterion comment="haldaemon runlevel 4" test_ref="oval:ssg:tst:216"/>
    <criterion comment="haldaemon runlevel 5" test_ref="oval:ssg:tst:217"/>
    <criterion comment="haldaemon runlevel 6" test_ref="oval:ssg:tst:218"/>
  </criteria>
</criteria>

Next, let's look at the tests.

<tests>
  <unix:file_test id="oval:com.example.oval:tst:1" version="1" check_existence="all_exist" check="all" comment="/etc/redhat-release exists">
    <unix:object object_ref="oval:com.example.oval:obj:1" />
  </unix:file_test>
  <ind:textfilecontent54_test id="oval:com.example.oval:tst:2" check="all" check_existence="all_exist" version="1" comment="The value of forwardable in /etc/krb5.conf">
    <ind:object object_ref="oval:com.example.oval:obj:2" />
    <ind:state state_ref="oval:com.example.oval:ste:2" />
  </ind:textfilecontent54_test>
</tests>

There are two tests defined here. The first test just checks if /etc/redhat-release exists. If not, then the test will fail and the definition itself will result to false (as in, not compliant). This isn't actually a proper definition, because you want the test to not run when it is on a different platform, but for the sake of example and simplicity, let's keep it as is.

The second test will check for the value of the forwardable key in /etc/krb5.conf. For it, it refers to an object and a state. The test states that all objects must exist (check_existence="all_exist") and that all objects must match the state (check="all").

The object definition looks like so:

<objects>
  <unix:file_object id="oval:com.example.oval:obj:1" comment="The /etc/redhat-release file" version="1">
    <unix:filepath>/etc/redhat-release</unix:filepath>
  </unix:file_object>
  <ind:textfilecontent54_object id="oval:com.example.oval:obj:2" comment="The forwardable key" version="1">
    <ind:filepath>/etc/krb5.conf</ind:filepath>
    <ind:pattern operation="pattern match">^\s*forwardable\s*=\s*((true|false))\w*</ind:pattern>
    <ind:instance datatype="int" operation="equals">1</ind:instance>
  </ind:textfilecontent54_object>
</objects>

The first object is a simple file reference. The second is a text file content object. More specifically, it matches the line inside /etc/krb5.conf which has forwardable = true or forwardable = false in it. An expression is made on it, so that we can refer to the subexpression as part of the test.

This test looks like so:

<states>
  <ind:textfilecontent54_state id="oval:com.example.oval:ste:2" version="1">
    <ind:subexpression datatype="string">true</ind:subexpression>
  </ind:textfilecontent54_state>
</states>

This test refers to a subexpression, and wants it to be true.

Testing the checks with Open-SCAP

The Open-SCAP tool is able to test OVAL statements directly. For instance, with the above definition in a file called oval.xml:

~$ oscap oval eval --results oval-results.xml oval.xml
Definition oval:com.example.oval:def:1: true
Evaluation done.

The output of the command shows that the definition was evaluated successfully. If you want more information, open up the oval-results.xml file which contains all the details about the test. This results file is also very useful while developing OVAL as it shows the entire result of objects, tests and so forth.

For instance, the /etc/redhat-release file was only checked to see if it exists, but the results file shows what other parameters can be verified with it as well:

<unix-sys:file_item id="1233781" status="exists">
  <unix-sys:filepath>/etc/redhat-release</unix-sys:filepath>
  <unix-sys:path>/etc</unix-sys:path>
  <unix-sys:filename>redhat-release</unix-sys:filename>
  <unix-sys:type>regular</unix-sys:type>
  <unix-sys:group_id datatype="int">0</unix-sys:group_id>
  <unix-sys:user_id datatype="int">0</unix-sys:user_id>
  <unix-sys:a_time datatype="int">1515186666</unix-sys:a_time>
  <unix-sys:c_time datatype="int">1514927465</unix-sys:c_time>
  <unix-sys:m_time datatype="int">1498674992</unix-sys:m_time>
  <unix-sys:size datatype="int">52</unix-sys:size>
  <unix-sys:suid datatype="boolean">false</unix-sys:suid>
  <unix-sys:sgid datatype="boolean">false</unix-sys:sgid>
  <unix-sys:sticky datatype="boolean">false</unix-sys:sticky>
  <unix-sys:uread datatype="boolean">true</unix-sys:uread>
  <unix-sys:uwrite datatype="boolean">true</unix-sys:uwrite>
  <unix-sys:uexec datatype="boolean">false</unix-sys:uexec>
  <unix-sys:gread datatype="boolean">true</unix-sys:gread>
  <unix-sys:gwrite datatype="boolean">false</unix-sys:gwrite>
  <unix-sys:gexec datatype="boolean">false</unix-sys:gexec>
  <unix-sys:oread datatype="boolean">true</unix-sys:oread>
  <unix-sys:owrite datatype="boolean">false</unix-sys:owrite>
  <unix-sys:oexec datatype="boolean">false</unix-sys:oexec>
  <unix-sys:has_extended_acl datatype="boolean">false</unix-sys:has_extended_acl>
</unix-sys:file_item>

Now, this is just on OVAL level. The final step is to link it in the XCCDF file.

Referring to OVAL in XCCDF

The XCCDF Rule entry allows for a check element, which refers to an automated check for compliance.

For instance, the above rule could be referred to like so:

<Rule id="xccdf_com.example_rule_krb5-forwardable-true">
  <title>Enable forwardable tickets on RHEL systems</title>
  ...
  <check system="http://oval.mitre.org/XMLSchema/oval-definitions-5">
    <check-content-ref href="oval.xml" name="oval:com.example.oval:def:1" />
  </check>
</Rule>

With this set in the Rule, Open-SCAP can validate it while checking the configuration baseline:

~$ oscap xccdf eval --oval-results --results xccdf-results.xml xccdf.xml
...
Title   Enable forwardable kerberos tickets in krb5.conf libdefaults
Rule    xccdf_com.example_rule_krb5-forwardable-tickets
Ident   RHEL7-01007
Result  pass

A huge advantage here is that, alongside the detailed results of the run, there is also better human readable output as it shows the title of the Rule being checked.

The detailed capabilities of OVAL

In the above example I've used two examples: a file validation (against /etc/redhat-release) and a file content one (against /etc/krb5.conf). However, OVAL has many more checks and support for it, and also has constraints that you need to be aware of.

In the OVAL Project github account, the Language repository keeps track of the current documentation. By browsing through it, you'll notice that the OVAL capabilities are structured based on the target technology that you can check. Right now, this is AIX, Android, Apple iOS, Cisco ASA, Cisco CatOS, VMWare ESX, FreeBSD, HP-UX, Cisco iOS and iOS-XE, Juniper JunOS, Linux, MacOS, NETCONF, Cisco PIX, Microsoft SharePoint, Unix (generic), Microsoft Windows, and independent.

The independent one contains tests and support for resources that are often reusable toward different platforms (as long as your OVAL and XCCDF supporting tools can run it on those platforms). A few notable supporting tests are:

  • filehash58_test which can check for a number of common hashes (such as SHA-512 and MD5). This is useful when you want to make sure that a particular (binary or otherwise) file is available on the system. In enterprises, this could be useful for license files, or specific library files.
  • textfilecontent54_test which can check the content of a file, with support for regular expressions.
  • xmlfilecontent_test which is a specialized test toward XML files

Keep in mind though that, as we have seen above, INI files specifically have no specialization available. It would be nice if CISecurity would develop support for common textual data formats, such as CSV (although that one is easily interpretable with the existing ones), JSON, YAML and INI.

The unix one contains tests specific to Unix and Unix-like operating systems (so yes, it is also useful for Linux), and together with the linux one a wide range of configurations can be checked. This includes support for generic extended attributes (fileextendedattribute_test) as well as SELinux specific rules (selinuxboolean_test and selinuxsecuritycontext_test), network interface settings (interface_test), runtime processes (process58_test), kernel parameters (sysctl_test), installed software tests (such as rpminfo_test for RHEL and other RPM enabled operating systems) and more.

March 01, 2018
Sebastian Pipping a.k.a. sping (homepage, bugs)

When system-wide pip turns out too old (e.g. for lacking support for pip check), one may end up trying to update pip using a command like:

sudo pip install --upgrade pip

That's likely to end up with this message:

Not uninstalling pip at /usr/lib/python2.7/dist-packages, owned by OS

That non-error and the confusion that easily happens right after is why I'm writing this post. So let's look at the whole thing in a bit more context on a shell, a Debian jessie one in this case:

# cat /etc/debian_version 
8.10

# pip install --upgrade pip ; echo $?
Downloading/unpacking pip from https://pypi.python.org/packages/b6[..]44
  Downloading pip-9.0.1-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Installing collected packages: pip
  Found existing installation: pip 1.5.6
    Not uninstalling pip at /usr/lib/python2.7/dist-packages, owned by OS
Successfully installed pip
Cleaning up...
0

# pip --version
pip 1.5.6 from /usr/lib/python2.7/dist-packages (python 2.7)

Now the interesting part is that it looks like pip would not have been updated. That impression is false : Latest pip has been installed successfully (to /usr/local/bin). One of two things is going on here: a) Unexpected Path resolution order You have /usr/bin/ before /usr/local/bin/ in $PATH, e.g. as with root of Debian jessie, so that the new pip has no chance of winning the race of path resolution for pip. For example:

# sed 's,:,\n,g' <<<"$PATH"
/bin
/sbin
/usr/bin
/usr/sbin
/usr/local/bin
/usr/local/sbin
/opt/bin
/usr/lib/llvm/5/bin
/usr/lib/llvm/4/bin

b) Location hashing at shell level Your shell has hashed the old location of pip (as Bash would do) and "hides" the new version from you in the current shell session. To see that in action, we utilize Bash builtins type and hash:

# type pip
pip is hashed (/usr/bin/pip)

# pip --version
pip 1.5.6 from /usr/lib/python2.7/dist-packages (python 2.7)

# hash -d pip

# type pip
pip is /usr/local/bin/pip

# pip --version
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)

So in either case you can run a recent pip from /usr/local/bin/pip right after pip install --upgrade pip, no need to resort to get-pip.py or so, in fact.

This "crazy" vulnerability in LibreOffice only came to my attention recently:

LibreOffice < 6.0.1 - '=WEBSERVICE' Remote Arbitrary File Disclosure (exploit-db.com)

Please make sure yours peers update in time.

February 28, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
Evaluating ScyllaDB for production 2/2 (February 28, 2018, 10:32 UTC)

In my previous blog post, I shared 7 lessons on our experience in evaluating Scylla for production.

Those lessons were focused on the setup and execution of the POC and I promised a more technical blog post with technical details and lessons learned from the POC, here it is!

Before you read on, be mindful that our POC was set up to test workloads and workflows, not to benchmark technologies. So even if the Scylla figures are great, they have not been the main drivers of the actual conclusion of the POC.

Business context

As a data driven company working in the Marketing and Advertising industry, we help our clients make sense of multiple sources of data to build and improve their relationship with their customers and prospects.

Dealing with multiple sources of data is nothing new but their volume has dramatically changed during the past decade. I will spare you with the Big-Data-means-nothing term and the technical challenges that comes with it as you already heard enough of it.

Still, it is clear that our line of business is tied to our capacity at mixing and correlating a massive amount of different types of events (data sources/types) coming from various sources which all have their own identifiers (think primary keys):

  • Web navigation tracking: identifier is a cookie that’s tied to the tracking domain (we have our own)
  • CRM databases: usually the email address or an internal account ID serve as an identifier
  • Partners’ digital platform: identifier is also a cookie tied to their tracking domain

To try to make things simple, let’s take a concrete example:

You work for UNICEF and want to optimize their banner ads budget by targeting the donors of their last fundraising campaign.

  • Your reference user database is composed of the donors who registered with their email address on the last campaign: main identifier is the email address.
  • To buy web display ads, you use an Ad Exchange partner such as AppNexus or DoubleClick (Google). From their point of view, users are seen as cookie IDs which are tied to their own domain.

So you basically need to be able to translate an email address to a cookie ID for every partner you work with.

Use case: ID matching tables

We operate and maintain huge ID matching tables for every partner and a great deal of our time is spent translating those IDs from one to another. In SQL terms, we are basically doing JOINs between a dataset and those ID matching tables.

  • You select your reference population
  • You JOIN it with the corresponding ID matching table
  • You get a matched population that your partner can recognize and interact with

Those ID matching tables have a pretty high read AND write throughput because they’re updated and queried all the time.

Usual figures are JOINs between a 10+ Million dataset and 1.5+ Billion ID matching tables.

The reference query basically looks like this:

SELECT count(m.partnerid)
FROM population_10M_rows AS p JOIN partner_id_match_400M_rows AS m
ON p.id = m.id

 Current implementations

We operate a lambda architecture where we handle real time ID matching using MongoDB and batch ones using Hive (Apache Hadoop).

The first downside to note is that it requires us to maintain two copies of every ID matching table. We also couldn’t choose one over the other because neither MongoDB nor Hive can sustain both the read/write lookup/update ratio while performing within the low latencies that we need.

This is an operational burden and requires quite a bunch of engineering to ensure data consistency between different data stores.

Production hardware overview:

  • MongoDB is running on a 15 nodes (5 shards) cluster
    • 64GB RAM, 2 sockets, RAID10 SAS spinning disks, 10Gbps dual NIC
  • Hive is running on 50+ YARN NodeManager instances
    • 128GB RAM, 2 sockets, JBOD SAS spinning disks, 10Gbps dual NIC

Target implementation

The key question is simple: is there a technology out there that can sustain our ID matching tables workloads while maintaining consistently low upsert/write and lookup/read latencies?

Having one technology to handle both use cases would allow:

  • Simpler data consistency
  • Operational simplicity and efficiency
  • Reduced costs

POC hardware overview:

So we decided to find out if Scylla could be that technology. For this, we used three decommissioned machines that we had in the basement of our Paris office.

  • 2 DELL R510
    • 19GB RAM, 2 socket 8 cores, RAID0 SAS spinning disks, 1Gbps NIC
  • 1 DELL R710
    • 19GB RAM, 2 socket 4 cores, RAID0 SAS spinning disks, 1Gbps NIC

I know, these are not glamorous machines and they are even inconsistent in specs, but we still set up a 3 node Scylla cluster running Gentoo Linux with them.

Our take? If those three lousy machines can challenge or beat the production machines on our current workloads, then Scylla can seriously be considered for production.

Step 1: Validate a schema model

Once the POC document was complete and the ScyllaDB team understood what we were trying to do, we started iterating on the schema model using a query based modeling strategy.

So we wrote down and rated the questions that our model(s) should answer to, they included stuff like:

  • What are all our cookie IDs associated to the given partner ID ?
  • What are all the cookie IDs associated to the given partner ID over the last N months ?
  • What is the last cookie ID/date for the given partner ID ?
  • What is the last date we have seen the given cookie ID / partner ID couple ?

As you can imagine, the reverse questions are also to be answered so ID translations can be done both ways (ouch!).

Prototyping

This is no news that I’m a Python addict so I did all my prototyping using Python and the cassandra-driver.

I ended up using a test-driven data modelling strategy using pytest. I wrote tests on my dataset so I could concentrate on the model while making sure that all my questions were being answered correctly and consistently.

Schema

In our case, we ended up with three denormalized tables to answer all the questions we had. To answer the first three questions above, you could use the schema below:

CREATE TABLE IF NOT EXISTS ids_by_partnerid(
 partnerid text,
 id text,
 date timestamp,
 PRIMARY KEY ((partnerid), date, id)
 )
 WITH CLUSTERING ORDER BY (date DESC)

Note on clustering key ordering

One important learning I got in the process of validating the model is about the internals of Cassandra’s file format that resulted in the choice of using a descending order DESC on the date clustering key as you can see above.

If your main use case of querying is to look for the latest value of an history-like table design like ours, then make sure to change the default ASC order of your clustering key to DESC. This will ensure that the latest values (rows) are stored at the beginning of the sstable file effectively reducing the read latency when the row is not in cache!

Let me quote Glauber Costa’s detailed explanation on this:

Basically in Cassandra’s file format, the index points to an entire partition (for very large partitions there is a hack to avoid that, but the logic is mostly the same). So if you want to read the first row, that’s easy you get the index to the partition and read the first row. If you want to read the last row, then you get the index to the partition and do a linear scan to the next.

This is the kind of learning you can only get from experts like Glauber and that can justify the whole POC on its own!

Step 2: Set up scylla-grafana-monitoring

As I said before, make sure to set up and run the scylla-grafana-monitoring project before running your test workloads. This easy to run solution will be of great help to understand the performance of your cluster and to tune your workload for optimal performances.

If you can, also discuss with the ScyllaDB team to allow them to access the Grafana dashboard. This will be very valuable since they know where to look better than we usually do… I gained a lot of understandings thanks to this!

Note on scrape interval

I advise you to lower the Prometheus scrape interval to have a shorter and finer sampling of your metrics. This will allow your dashboard to be more reactive when you start your test workloads.

For this, change the prometheus/prometheus.yml file like this:

scrape_interval: 2s # Scrape targets every 2 seconds (5s default)
scrape_timeout: 1s # Timeout before trying to scrape a target again (4s default)

Test your monitoring

Before going any further, I strongly advise you to run a stress test on your POC cluster using the cassandra-stress tool and share the results and their monitoring graphs with the ScyllaDB team.

This will give you a common understanding of the general performances of your cluster as well as help in outlining any obvious misconfiguration or hardware problem.

Key graphs to look at

There are a lot of interesting graphs so I’d like to share the ones that I have been mainly looking at. Remember that depending on your test workloads, some other graphs may be more relevant for you.

  • number of open connections

You’ll want to see a steady and high enough number of open connections which will prove that your clients are pushed at their maximum (at the time of testing this graph was not on Grafana and you had to add it yourself)

  • cache hits / misses

Depending on your reference dataset, you’ll obviously see that cache hits and misses will have a direct correlation with disk I/O and overall performances. Running your test workloads multiple times should trigger higher cache hits if your RAM is big enough.

  • per shard/node distribution

The Requests Served per shard graph should display a nicely distributed load between your shards and nodes so that you’re sure that you’re getting the best out of your cluster.

The same is true for almost every other “per shard/node” graph: you’re looking for evenly distributed load.

  • sstable reads

Directly linked with your disk performances, you’ll be trying to make sure that you have almost no queued sstable reads.

Step 3: Get your reference data and metrics

We obviously need to have some reference metrics on our current production stack so we can compare them with the results on our POC Scylla cluster.

Whether you choose to use your current production machines or set up a similar stack on the side to run your test workloads is up to you. We chose to run the vast majority of our tests on our current production machines to be as close to our real workloads as possible.

Prepare a reference dataset

During your work on the POC document, you should have detailed the usual data cardinality and volume you work with. Use this information to set up a reference dataset that you can use on all of the platforms that you plan to compare.

In our case, we chose a 10 Million reference dataset that we JOINed with a 400+ Million extract of an ID matching table. Those volumes seemed easy enough to work with and allowed some nice ratio for memory bound workloads.

Measure on your current stack

Then it’s time to load this reference datasets on your current platforms.

  • If you run a MongoDB cluster like we do, make sure to shard and index the dataset just like you do on the production collections.
  • On Hive, make sure to respect the storage file format of your current implementations as well as their partitioning.

If you chose to run your test workloads on your production machines, make sure to run them multiple times and at different hours of the day and night so you can correlate the measures with the load on the cluster at the time of the tests.

Reference metrics

For the sake of simplicity I’ll focus on the Hive-only batch workloads. I performed a count on the JOIN of the dataset and the ID matching table using Spark 2 and then I also ran the JOIN using a simple Hive query through Beeline.

I gave the following definitions on the reference load:

  • IDLE: YARN available containers and free resources are optimal, parallelism is very limited
  • NORMAL: YARN sustains some casual load, parallelism exists but we are not bound by anything still
  • HIGH: YARN has pending containers, parallelism is high and applications have to wait for containers before executing

There’s always an error margin on the results you get and I found that there was not significant enough differences between the results using Spark 2 and Beeline so I stuck with a simple set of results:

  • IDLE: 2 minutes, 15 seconds
  • NORMAL: 4 minutes
  • HIGH: 15 minutes

Step 4: Get Scylla in the mix

It’s finally time to do your best to break Scylla or at least to push it to its limits on your hardware… But most importantly, you’ll be looking to understand what those limits are depending on your test workloads as well as outlining out all the required tuning that you will be required to do on the client side to reach those limits.

Speaking about the results, we will have to differentiate two cases:

  1. The Scylla cluster is fresh and its cache is empty (cold start): performance is mostly Disk I/O bound
  2. The Scylla cluster has been running some test workload already and its cache is hot: performance is mostly Memory bound with some Disk I/O depending on the size of your RAM

Spark 2 / Scala test workload

Here I used Scala (yes, I did) and DataStax’s spark-cassandra-connector so I could use the magic joinWithCassandraTable function.

  • spark-cassandra-connector-2.0.1-s_2.11.jar
  • Java 7

I had to stick with the 2.0.1 version of the spark-cassandra-connector because newer version (2.0.5 at the time of testing) were performing bad with no apparent reason. The ScyllaDB team couldn’t help on this.

You can interact with your test workload using the spark2-shell:

spark2-shell --jars jars/commons-beanutils_commons-beanutils-1.9.3.jar,jars/com.twitter_jsr166e-1.1.0.jar,jars/io.netty_netty-all-4.0.33.Final.jar,jars/org.joda_joda-convert-1.2.jar,jars/commons-collections_commons-collections-3.2.2.jar,jars/joda-time_joda-time-2.3.jar,jars/org.scala-lang_scala-reflect-2.11.8.jar,jars/spark-cassandra-connector-2.0.1-s_2.11.jar

Then use the following Scala imports:

// main connector import
import com.datastax.spark.connector._

// the joinWithCassandraTable failed without this (dunno why, I'm no Scala guy)
import com.datastax.spark.connector.writer._
implicit val rowWriter = SqlRowWriter.Factory

Finally I could run my test workload to select the data from Hive and JOIN it with Scylla easily:

val df_population = spark.sql("SELECT id FROM population_10M_rows")
val join_rdd = df_population.rdd.repartitionByCassandraReplica("test_keyspace", "partner_id_match_400M_rows").joinWithCassandraTable("test_keyspace", "partner_id_match_400M_rows")
val joined_count = join_rdd.count()

Notes on tuning spark-cassandra-connector

I experienced pretty crappy performances at first. Thanks to the easy Grafana monitoring, I could see that Scylla was not being the bottleneck at all and that I instead had trouble getting some real load on it.

So I engaged in a thorough tuning of the spark-cassandra-connector with the help of Glauber… and it was pretty painful but we finally made it and got the best parameters to get the load on the Scylla cluster close to 100% when running the test workloads.

This tuning was done in the spark-defaults.conf file:

  • have a fixed set of executors and boost their overhead memory

This will increase test results reliability by making sure you always have a reserved number of available workers at your disposal.

spark.dynamicAllocation.enabled=false
spark.executor.instances=30
spark.yarn.executor.memoryOverhead=1024
  • set the split size to 1MB

Default is 8MB but Scylla uses a split size of 1MB so you’ll see a great boost of performance and stability by setting this setting to the right number.

spark.cassandra.input.split.size_in_mb=1
  • align driver timeouts with server timeouts

It is advised to make sure that your read request timeouts are the same on the driver and the server so you do not get stalled states waiting for a timeout to happen on one hand. You can do the same with write timeouts if your test workloads are write intensive.

/etc/scylla/scylla.yaml

read_request_timeout_in_ms: 150000

spark-defaults.conf

spark.cassandra.connection.timeout_ms=150000
spark.cassandra.read.timeout_ms=150000

// optional if you want to fail / retry faster for HA scenarios
spark.cassandra.connection.reconnection_delay_ms.max=5000
spark.cassandra.connection.reconnection_delay_ms.min=1000
spark.cassandra.query.retry.count=100
  • adjust your reads per second rate

Last but surely not least, this setting you will need to try and find out the best value for yourself since it has a direct impact on the load on your Scylla cluster. You will be looking at pushing your POC cluster to almost 100% load.

spark.cassandra.input.reads_per_sec=6666

As I said before, I could only get this to work perfectly using the 2.0.1 version of the spark-cassandra-connector driver. But then it worked very well and with great speed.

Spark 2 results

Once tuned, the best results I was able to reach on this hardware are listed below. It’s interesting to see that with spinning disks, the cold start result can compete with the results of a heavily loaded Hadoop cluster where pending containers and parallelism are knocking down its performances.

  • hot cache: 2min
  • cold cache: 12min

Wow! Those three refurbished machines can compete with our current production machines and implementations, they can even match an idle Hive cluster of a medium size!

Python test workload

I couldn’t conclude on a Scala/Spark 2 only test workload. So I obviously went back to my language of choice Python only to discover at my disappointment that there is no joinWithCassandraTable equivalent available on pyspark

I tried with some projects claiming otherwise with no success until I changed my mind and decided that I probably didn’t need Spark 2 at all. So I went into the crazy quest of beating Spark 2 performances using a pure Python implementation.

This basically means that instead of having a JOIN like helper, I had to do a massive amount of single “id -> partnerid” lookups. Simple but greatly inefficient you say? Really?

When I broke down the pieces, I was left with the following steps to implement and optimize:

  • Load the 10M rows worth of population data from Hive
  • For every row, lookup the corresponding partnerid in the ID matching table from Scylla
  • Count the resulting number of matches

The main problem to compete with Spark 2 is that it is a distributed framework and Python by itself is not. So you can’t possibly imagine outperforming Spark 2 with your single machine.

However, let’s remember that Spark 2 is shipped and ran on executors using YARN so we are firing up JVMs and dispatching containers all the time. This is a quite expensive process that we have a chance to avoid using Python!

So what I needed was a distributed computation framework that would allow to load data in a partitioned way and run the lookups on all the partitions in parallel before merging the results. In Python, this framework exists and is named Dask!

You will obviously need to have to deploy a dask topology (that’s easy and well documented) to have a comparable number of dask workers than of Spark 2 executors (30 in my case) .

The corresponding Python code samples are here.

Hive + Scylla results

Reading the population id’s from Hive, the workload can be split and executed concurrently on multiple dask workers.

  • read the 10M population rows from Hive in a partitioned manner
  • for each partition (slice of 10M), query Scylla to lookup the possibly matching partnerid
  • create a dataframe from the resulting matches
  • gather back all the dataframes and merge them
  • count the number of matches

The results showed that it is possible to compete with Spark 2 with Dask:

  • hot cache: 2min (rounded up)
  • cold cache: 6min

Interestingly, those almost two minutes can be broken down like this:

  • distributed read data from Hive: 50s
  • distributed lookup from Scylla: 60s
  • merge + count: 10s

This meant that if I could cut down the reading of data from Hive I could go even faster!

Parquet + Scylla results

Going further on my previous remark I decided to get rid of Hive and put the 10M rows population data in a parquet file instead. I ended up trying to find out the most efficient way to read and load a parquet file from HDFS.

My conclusion so far is that you can’t be the amazing libhdfs3 + pyarrow combo. It is faster to load everything on a single machine than loading from Hive on multiple ones!

The results showed that I could almost get rid of a whole minute in the total process, effectively and easily beating Spark 2!

  • hot cache: 1min 5s
  • cold cache: 5min

Notes on the Python cassandra-driver

Tests using Python showed robust queries experiencing far less failures than the spark-cassandra-connector, even more during the cold start scenario.

  • The usage of execute_concurrent() provides a clean and linear interface to submit a large number of queries while providing some level of concurrency control
  • Increasing the concurrency parameter from 100 to 512 provided additional throughput, but increasing it more looked useless
  • Protocol version 4 forbids the tuning of connection requests / number to some sort of auto configuration. All tentative to hand tune it (by lowering protocol version to 2) failed to achieve higher throughput
  • Installation of libev on the system allows the cassandra-driver to use it to handle concurrency instead of asyncore with a somewhat lower load footprint on the worker node but no noticeable change on the throughput
  • When reading a parquet file stored on HDFS, the hdfs3 + pyarrow combo provides an insane speed (less than 10s to fully load 10M rows of a single column)

Step 5: Play with High Availability

I was quite disappointed and surprised by the lack of maturity of the Cassandra community on this critical topic. Maybe the main reason is that the cassandra-driver allows for too many levels of configuration and strategies.

I wrote this simple bash script to allow me to simulate node failures. Then I could play with handling those failures and retries on the Python client code.

#!/bin/bash

iptables -t filter -X
iptables -t filter -F

ip="0.0.0.0/0"
for port in 9042 9160 9180 10000 7000; do
	iptables -t filter -A INPUT -p tcp --dport ${port} -s ${ip} -j DROP
	iptables -t filter -A OUTPUT -p tcp --sport ${port} -d ${ip} -j DROP
done

while true; do
	trap break INT
	clear
	iptables -t filter -vnL
	sleep 1
done

iptables -t filter -X
iptables -t filter -F
iptables -t filter -vnL

This topic is worth going in more details on a dedicated blog post that I shall write later on while providing code samples.

Concluding the evaluation

I’m happy to say that Scylla passed our production evaluation and will soon go live on our infrastructure!

As I said at the beginning of this post, the conclusion of the evaluation has not been driven by the good figures we got out of our test workloads. Those are no benchmarks and never pretended to be but we could still prove that performances were solid enough to not be a blocker in the adoption of Scylla.

Instead we decided on the following points of interest (in no particular order):

  • data consistency
  • production reliability
  • datacenter awareness
  • ease of operation
  • infrastructure rationalisation
  • developer friendliness
  • costs

On the side, I tried Scylla on two other different use cases which proved interesting to follow later on to displace MongoDB again…

Moving to production

Since our relationship was great we also decided to partner with ScyllaDB and support them by subscribing to their enterprise offerings. They also accepted to support us using Gentoo Linux!

We are starting with a three nodes heavy duty cluster:

  • DELL R640
    • dual socket 2,6GHz 14C, 512GB RAM, Samsung 17xxx NVMe 3,2 TB

I’m eager to see ScyllaDB building up and will continue to help with my modest contributions. Thanks again to the ScyllaDB team for their patience and support during the POC!

February 23, 2018
Michal Hrusecky a.k.a. miska (homepage, bugs)
Honeypot as a service (February 23, 2018, 15:00 UTC)

HaaS logoI’m currently working at CZ.NIC, Czech domain registry on project Turris which are awesome open source WiFI (or WiFi free) routers. For those we developed quite some interesting features. One of them is honeypot that you don’t run on your own hardware (what if somebody managed to escape) but you basically do man in the middle on the attacker and forward him to the honeypot we are running behind many firewalls. We had this option for quite some time on our routers. But because plenty of people around the world found the idea really interesting and wanted to join, this part of our project got separated, has its own team of developers and maintainers and you can now join with your own server as well! And to make it super easy, packages are available in Tumbleweed already and also in security repo where they are being build for Leap as well.

How to get started, how it works and what will you get when you join? First step is register on HaaS website. You can also find there explanation what HaaS actually is. When you log in, you can create a new computer and generate a token for it. Once you have a token, it’s time to setup software on your server.

Second step would be obviously to install the software. Given you are using the cool Linux distribution openSUSE Tumbleweed it is pretty easy. Just zypper in haas-proxy.

Last step is configuration. You need to either disable or mive to different port your real ssh. You can do so easily in /etc/ssh/sshd_config, look for Port option and change it from 22 to some other fancy number. Don’t forget to open that port on firewall as well. After calling systemctl restart sshd you should be able to ssh on new port and your port 22 should be free.

Now do you still remember the token you generated on HaaS website? You need to enter it into /etc/haas-proxy, option TOKEN. And that’s all, call systemctl enable haas-proxy and systemctl start haas-proxy and the trap is set and all you need to do is wait for your victims to fall in.

Once they do (if you have public ipv4 than you should have plenty after just a day), you can go to HaaS website again and browse through the logs of trapped visitors or even view some statistics like which country attacks you the most!

HaaS mapSo enjoy the hunt and let’s trap a lot of bad guys 🙂 btw. Anonymized data from those honeypot sessions are later available to download and CZ.NIC has some security researchers from CSIRT team working on those, so you are having fun, don’t compromise your own security and helping the world at once! So win,win,win situation 🙂

February 21, 2018
Arun Raghavan a.k.a. ford_prefect (homepage, bugs)
Applicative Functors for Fun and Parsing (February 21, 2018, 07:24 UTC)

PSA: This post has a bunch of Haskell code, but I’m going to try to make it more broadly accessible. Let’s see how that goes.

I’ve been proceeding apace with my 3rd year in Abhinav’s Haskell classes at Nilenso, and we just got done with the section on Applicative Functors. I’m at that point when I finally “get” it, so I thought I’d document the process, and maybe capture my a-ha moment of Applicatives.

I should point out that the ideas and approach in this post are all based on Abhinav’s class material (and I’ve found them really effective in understanding the underlying concepts). Many thanks are due to him, and any lack of clarity you find ahead is in my own understanding.

Functors and Applicatives

Functors represent a type or a context on which we can meaningfully apply (map) a function. The Functor typeclass is pretty straightforward:

class Functor f where
  fmap :: (a -> b) -> f a -> f b

Easy enough. fmap takes a function that transforms something of type a to type b and a value of type a in a context f. It produces a value of type b in the same context.

The Applicative typeclass adds two things to Functor. Firstly, it gives us a means of putting things inside a context (also called lifting). The second is to apply a function within a context.

class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

We can see pure lifts a given value into a context. The apply function (<*>) intuitively looks like fmap, with the difference that the function is within a context. This becomes key when we remember that Haskell functions are curried (and can thus be partially applied). This would then allow us to write something like:

maybeAdd :: Maybe Int -> Maybe Int -> Maybe Int
maybeAdd ma mb = pure (+) <*> ma <*> mb

This function takes two numbers in the Maybe context (that is, they either exist, or are Nothing), and adds them. The result will be the sum if both numbers exist, or Nothing if either or both do not.

Go ahead and convince yourself that it is painful to express this generically with just fmap.

Parsers

There are many ways of looking at what a parser is. Let’s work with one definition: A parser,

  • Takes some input
  • Converts some or all of it into something else if it can
  • Returns whatever input was not used in the conversion

How do we represent something that converts something to something else? It’s a function, of course. Let’s write that down as a type:

newtype Parser i o = Parser (i -> (Maybe o, i))

This more or less directly maps to what we just said. A Parser is a data type which has two type parameters — an input type and an output type. It contains a function that takes one argument of the input type, and produces a tuple of Maybe the output type (signifying if parsing succeeded) and the rest of the input.

We can name the field runParser, so it becomes easier to get a hold of the function inside our Parser type:

newtype Parser i o = Parser { runParser :: i -> (Maybe o, i) }

Parser combinators

The “rest” part is important for the reason that we would like to be able to chain small parsers together to make bigger parsers. We do this using “parser combinators” — functions that take one or more parsers and return a more complex parser formed by combining them in some way. We’ll see some of those ways as we go along.

Parser instances

Before we proceed, let’s define Functor and Applicative instances for our Parser type.

instance Functor (Parser i) where
  fmap f p = Parser $ \input ->
    let (mo, i) = runParser p input
    in (f <$> mo, i)

The intuition here is clear — if I have a parser that takes some input and provides some output, fmaping a function on that parser translates to applying that function on the output of the parser.

instance Applicative (Parser i) where
  pure x = Parser $ \input -> (Just x, input)

  pf <*> po = Parser $ \input ->
    case runParser pf input of
         (Just f, rest) -> case runParser po rest of
                                (Just o, rest') -> (Just (f o), rest')
                                (Nothing, _)    -> (Nothing, input)
         (Nothing, _)   -> (Nothing, input)

The Applicative instance is a bit more involved than Functor. What we’re doing first is “running” the first parser which gives us the function we want to apply (remember that this is a curried function, so rather than parsing out a function, we are most likely parsing out a value and creating a function with that). If we succeed, then we run the second parser to get a value to apply the function to. If this is also successful, we apply the function to the value, and return the result within the parser context (i.e. the result, and the rest of the input).

Implementing some parsers

Now let’s take our new data type and instances for a spin. Before we write a real parser, let’s write a helper function. A common theme while parsing a string is to match a single character on a predicate — for example, “is this character an alphabet”, or “is this character a semi-colon”. We write a function to take a predicate and return the corresponding parser:

satisfy :: (Char -> Bool) -> Parser String Char
satisfy p = Parser $ \input ->
  case input of
       (c:cs) | p c -> (Just c, cs)
       _            -> (Nothing, input)

Now let’s try to make a parser that takes a string, and if it finds a ASCII digit character, provides the corresponding integer value. We have a function from the Data.Char module to match ASCII digit characters — isDigit. We also have a function to take a digit character and give us an integer — digitToInt. Putting this together with satisfy above.

import Data.Char (digitToInt, isDigit)

digit :: Parser String Int
digit = digitToInt <$> satisfy isDigit

And that’s it! Note how we used our higher-order satisfy function to match a ASCII digit character and the Functor instance to apply digitToInt to the result of that parser (reminder: <$> is just the infix form of writing fmap — this is the same as fmap digitToInt (satisfy digit).

Another example — a character parser, which succeeds if the next character in the input is a specific character we choose.

char :: Char -> Parser String Char
char x = satisfy (x ==)

Once again, the satisfy function makes this a breeze. I must say I’m pleased with the conciseness of this.

Finally, let’s combine character parsers to create a word parser — a parser that succeeds if the input is a given word.

word :: String -> Parser String String
word ""     = Parser $ \input -> (Just "", input)
word (c:cs) = (:) <$> char c <*> word cs

A match on an empty word always succeeds. For anything else, we just break down the parser to a character parser of the first character and a recursive call to the word parser for the rest. Again, note the use of the Functor and Applicative instance. Let’s look at the type signature of the (:) (list cons) function, which prepends an element to a list:

(:) :: a -> [a] -> [a]

The function takes two arguments — a single element of type a, and a list of elements of type a. If we expand the types some more, we’ll see that the first argument we give it is a Parser String Char and the second is a Parser String [Char] (String is just an alias for [Char]).

In this way we are able to take the basic list prepend function and use it to construct a list of characters within the Parser context. (a-ha!?)

JSON

JSON is a relatively simple format to parse, and makes for a good example for building a parser. The JSON website has a couple of good depictions of the JSON language grammar front and center.

So that defines our parser problem then — we want to read a string input, and convert it into some sort of in-memory representation of the JSON value. Let’s see what that would look like in Haskell.

data JsonValue = JsonString String
               | JsonNumber JsonNum
               | JsonObject [(String, JsonValue)]
               | JsonArray [JsonValue]
               | JsonBool Bool
               | JsonNull

-- We represent a number as an infinite precision
-- floating point number with a base 10 exponent
data JsonNum = JsonNum { negative :: Bool
                       , signif   :: Integer
                       , expo     :: Integer
                       }

The JSON specification does not really tell us what type to use for numbers. We could just use a Double, but to make things interesting, we represent it as an arbitrary precision floating point number.

Note that the JsonArray and JsonObject constructors are recursive, as they should be — a JSON array is an array of JSON values, and a JSON object is a mapping from string keys to JSON values.

Parsing JSON

We now have the pieces we need to start parsing JSON. Let’s start with the easy bits.

null

To parse a null we literally just look for the word “null”.

jsonNull :: Parser String JsonValue
jsonNull = word "null" $> JsonNull

The $> operator is a flipped shortcut for fmap . const — it evaluates the argument on the left, and then fmaps the argument on the right onto it. If the word "null" parser is successful (Just "null"), we’ll fmap the JsonValue representing null to replace the string "null" (i.e. we’ll get a (Just JsonNull, <rest of the input>)).

true and false

First a quick detour:

instance Alternative (Parser i) where
  empty = Parser $ \input -> (Nothing, input)
  p1 <|> p2 = Parser $ \input ->
      case runParser p1 input of
           (Nothing, _) -> case runParser p2 input of
                                (Nothing, _) -> (Nothing, input)
                                justValue    -> justValue
           justValue    -> justValue

The Alternative instance is easy to follow once you understand Applicative. We define an empty parser that matches nothing. Then we define the alternative operator (<|>) as we might intuitively imagine.

We run the parser given as the first argument first, if it succeeds we are done. If it fails, we run the second parser on the whole input again, if it succeeds, we return that value. If both fail, we return Nothing.

Parsing true and false with this in our belt looks like:

jsonBool :: Parser String JsonValue
jsonBool =  (word "true" $> JsonBool True)
        <|> (word "false" $> JsonBool False)

We are easily able express the idea of trying to parse for the string “true”, and if that fails, trying again for the string “false”. If either matches, we have a boolean value, if not, Nothing. Again, nice and concise.

String

This is only slightly more complex. We need a couple of helper functions first:

hexDigit :: Parser String Int
hexDigit = digitToInt <$> satisfy isHexDigit

digitsToNumber :: Int -> [Int] -> Integer
digitsToNumber base digits = foldl (\num d -> num * fromIntegral base + fromIntegral d) 0 digits

hexDigit is easy to follow. It just matches anything from 0-9 and a-f or A-F.

digitsToNumber is a pure function that takes a list of digits, and interprets it as a number in the given base. We do some jumping through hoops with fromIntegral to take Int digits (mapping to a normal word-sized integer) and produce an Integer (arbitrary sized integer).

Now follow along one line at a time:

jsonString :: Parser String String
jsonString = (char '"' *> many jsonChar <* char '"')
  where
    jsonChar =  satisfy (\c -> not (c == '\"' || c == '\\' || isControl c))
            <|> word "\\\"" $> '"'
            <|> word "\\\\" $> '\\'
            <|> word "\\/"  $> '/'
            <|> word "\\b"  $> '\b'
            <|> word "\\f"  $> '\f'
            <|> word "\\n"  $> '\n'
            <|> word "\\r"  $> '\r'
            <|> word "\\t"  $> '\t'
            <|> chr . fromIntegral . digitsToNumber 16 <$> (word "\\u" *> replicateM 4 hexDigit)

A string is a valid JSON character, surrounded by quotes. The *> and <* operators allow us to chain parsers whose output we wish to discard (since the quotes are not part of the actual string itself). The many function comes from the Alternative typeclass. It represents zero or more instances of context. In our case, it tries to match zero or more jsonChar parsers.

So what does jsonChar do? Following the definition of a character in the JSON spec, first we try to match something that is not a quote ("), a backslash (\) or a control character. If that doesn’t match, we try to match the various escape characters that the specification mentions.

Finally, if we get a \u followed by 4 hexadecimal characters, we put them in a list (replicateM 4 hexDigit chains 4 hexDigit parsers and provides the output as a list), convert that list into a base 16 integer (digitsToNumber), and then convert that to a Unicode character (chr).

The order of chaining these parsers does matter for performance. The first parser in our <|> chain is the one that is most likely (most characters are not escaped). This follows from our definition of the Alternative instance. We run the first parser, then the second, and so on. We want this to succeed as early as possible so we don’t run more parsers than necessary.

Arrays

Arrays and objects have something in common — they have items which are separated by some value (commas for array values, commas for each key-value pair in an object, and colons separating keys and values). Let’s just factor this commonality out:

sepBy :: Parser i v -> Parser i s -> Parser i [v]
sepBy v s = (:) <$> v <*> many (s *> v) 
         <|> pure []

We take a parser for our values (v), and a parser for our separator (s). We try to parse one or more v separated by s, and or just return an empty list in the parser context if there are none.

Now we write our JSON array parser as:

jsonArray :: Parser String JsonValue
jsonArray = JsonArray <$> (char '[' *> (json `sepBy` char ',') <* char ']')

Nice, that’s really succinct. But wait! What is json?

Putting it all together

We know that arrays contain JSON values. And we know how to parse some JSON values. Let’s try to put those together for our recursive definition:

json :: Parser String JsonValue
json =  jsonNull
    <|> jsonBool
    <|> jsonString
    <|> jsonArray
--  <|> jsonNumber
--  <|> jsonObject

And that’s it!

The JSON object and number parsers follow the same pattern. So far we’ve ignored spaces in the input, but those can be consumed and ignored easily enough based on what we’ve learned.

You can find the complete code for this exercise on Github.

Some examples of what this looks like in the REPL:

*Json> runParser json "null"
(Just null,"")

*Json> runParser json "true"
(Just true,"")

*Json> runParser json "[null,true,\"hello!\"]"
(Just [null, true, "hello!" ],"")

Concluding thoughts

If you’ve made it this far, thank you! I realise this is long and somewhat dense, but I am very excited by how elegantly Haskell allows us to express these ideas, using fundamental aspects of its type(class) system.

A nice real world example of how you might use this is the optparse-applicative package which uses these ideas to greatly simplify the otherwise dreary task of parsing command line arguments.

I hope this post generates at least some of the excitement in you that it has in me. Feel free to leave your comments and thoughts below.

February 19, 2018
Jason A. Donenfeld a.k.a. zx2c4 (homepage, bugs)
WireGuard in Google Summer of Code (February 19, 2018, 14:55 UTC)

WireGuard is participating in Google Summer of Code 2018. If you're a student — bachelors, masters, PhD, or otherwise — who would like to be funded this summer for writing interesting kernel code, studying cryptography, building networks, making mobile apps, contributing to the larger open source ecosystem, doing web development, writing documentation, or working on a wide variety of interesting problems, then this may be appealing. You'll be mentored by world-class experts, and the summer will certainly boost your skills. Details are on this page — simply contact the WireGuard team to get a proposal into the pipeline.

Alice Ferrazzi a.k.a. alicef (homepage, bugs)

Gentoo has been accepted as a Google Summer of Code 2018 mentoring organization

Gentoo has been accepted into the Google Summer of Code 2018!

Contacts:
If you want to help in any way with Gentoo GSoC 2018 please add yourself here [1] as soon as possible, is important!. Someone from the Gentoo GSoC team will contact you back, as soon as possible. We are always searching for people willing to help with Gentoo GSoC 2018!

Students:
If you are a student and want to spend your summer on Gentoo projects and having fun writing code, you can start discussing ideas in the #gentoo-soc IRC Freenode channel [2].
You can find some ideas example here [3], but of course could be also something different.
Just remember that we strongly recommend that you work with a potential mentor to develop your idea before proposing it formally. Don’t waste any time because there’s typically some polishing which needs to occur before the deadline (March 28th)

I can assure you that Gentoo GSoC is really fun and a great experience for improving yourself. It is also useful for making yourself known in the Gentoo community. Contributing to the Gentoo GSoC will be contributing to Gentoo, a bleeding edge distribution, considered by many to be for experts and with near-unlimited adaptability.

If you require any further information, please do not hesitate to contact us [4] or check the Gentoo GSoC 2018 wiki page [5].

[1] https://wiki.gentoo.org/wiki/Google_Summer_of_Code/2018/Mentors
[2] http://webchat.freenode.net/?channels=gentoo-soc
[3] https://wiki.gentoo.org/wiki/Google_Summer_of_Code/2018/Ideas
[4] soc-mentors@gentoo.org
[5] https://wiki.gentoo.org/wiki/Google_Summer_of_Code/2018

Gentoo accepted into Google Summer of Code 2018 (February 19, 2018, 00:00 UTC)

Students who want to spend their summer having fun and writing code can do so now for Gentoo. Gentoo has been accepted as a mentoring organization for this year’s Google Summer of Code.

The GSoC is an excellent opportunity for gaining real-world experience in software design and making one’s self known in the broader open source community. It also looks great on a resume.

Initial project ideas can be found here, although new projects ideas are welcome. For new projects time is of the essence: there is typically some idea-polishing which must occur before the March 27th deadline. Because of this it is strongly recommended that students refine new project ideas with a mentor before proposing the idea formally.

GSoC students are encouraged to begin discussing ideas in the #gentoo-soc IRC channel on the Freenode network.

Further information can be found on the Gentoo GSoC 2018 wiki page. Those with unanswered questions should not hesitate to contact the Summer of Code mentors via the mailing list.

February 14, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)
Rust-av: Rust and Multimedia (February 14, 2018, 19:44 UTC)

Recently I presented my new project at Fosdem and since I was afraid of not having enough time for the questions I trimmed the content to the bare minimum. This blog post should add some more details.

What is it?

Rust-av aims to be a complete multimedia toolkit written in rust.

Rust is a quite promising language that aims to offer high execution speed while granting a number of warranties on the code behavior that you cannot have in C, C++, Java and so on.

Its zero-cost abstraction feature coupled with the fact that the compiler actively prevents you from committing a large class of mistakes related to memory access seems a perfect match to implement a multimedia toolkit that is easy to use, fast enough and trustworthy.

Why something completely new?

Since rust code can be intermixed with C code, an evolutive approach of replacing little by little small components in a larger project is perfectly feasible, and it is what we are currently trying to do with vlc.

But rust is not just good to write some inner routines so they are fast and correct, its trait system is also quite useful to have a more expressive API.

Most of the multimedia concepts are pretty simple at the high level (e.g frame is just a picture or some sound with some timestamp) with an excruciating amount of quirk and details that require your toolkit to make choices for you or make you face a large amount of complexity.

That leads to API that are either easy but quite inflexible (and opinionated) or API providing all the flexibility, but forcing the user to have to learn a lot of information in order to achieve what the simpler API would let you implement in an handful of lines of code.

I wanted to leverage Rust to make the low level implementations with less bugs and, at the same time, try to provide a better API to use it.

Why now?

Since 2016 I kept bouncing ideas with Kostya and Geoffroy but between my work duties and other projects I couldn’t devote enough time to it. Thanks to the Mozilla Open Source Support initiative that awarded me with enough to develop it full time, now the project has already some components published and more will follow during the next months.

Philosophy

I’m trying to leverage the experience I have from contributing to vlc and libav and keep what is working well and try to not make the same mistakes.

Ease of use

I want that the whole toolkit to be useful to a wide audience. Developers often fight against the library in order to undo what is happening under the hood or end up vendoring some part of it since they need only a tiny subset of all the features that are provided.

Rust makes quite natural split large projects in independent components (called crates) and it is already quite common to have meta-crates re-exporting many smaller crates to provide some uniform access.

The rust-av code, as opposed to the rather monolithic approach taken in Libav, can be reused with the granularity of the bare codec or format:

  • Integrating it in a foreign toolkit won’t require to undo what the common utility code does.
  • Even when using it through the higher level layers, rust-av won’t force the developer to bring in any unrelated dependencies.
  • On the other hand users that enjoy a fully integrated and all-encompassing solution can simply depend on the meta-crates and get the support for everything.

Speed

Multimedia playback boils down to efficiently do complex computation so an arbitrary large amount of data can be rendered within a fraction of second, multimedia real time streaming requires to compress an equally large amount of data in the same time.

Speed in multimedia is important.

Rust provides high level idiomatic constructs that surprisingly lead to pretty decent runtime speed. The stdsimd effort and the seamless C ABI support make easier to leverage the SIMD instructions provided by the recent CPU architectures.

Trustworthy

Traditionally the most effective way to write fast multimedia code had been pairing C and assembly. Sadly the combination makes quite easy to overlook corner cases and have any kind of memory hazards (use-after-free, out of bound reads and writes, NULL-dereferences…).

Rust effectively prevents a good deal of those issues at compile time. Since its abstractions usually do not cause slowdowns it is possible to write code that is, arguably, less misleading and as fast.

Structure

The toolkit is composed of multiple, loosely coupled, crates. They can be grouped by level of abstraction.

Essential

av-data: Used by nearly all the other crates it provides basic data types and a minimal amount of functionality on top of it. It provides the following structs mainly:

  • Frame: it binds together a time reference and a buffer, representing either a video picture or some audio samples.
  • Packet: it bind together a time reference and a buffer, containing compressed data.
  • Value: Simple key value type abstraction, used to pass arbitrary data to the configuration functions.

Core

They provide the basic abstraction (traits) implemented by specific set of components.

  • av-format: It provides a set of traits to implement muxers and demuxers and an utility Context to bridge the normal rust I/O Write and Read traits and the actual muxers and demuxers.
  • av-codec: It provides a set of traits to implement encoders and decoders and an utility Context that wraps.

Utility

They provide building blocks that may be used to implement actual codecs and formats.

  • av-bitstream: Utility crate to write and read bits and bytes
  • av-audio: Audio-specific utilities
  • av-video: Video-specific utilities

Implementation

Actual implementations of codec and format, they can be used directly or through the utility Contexts.

The direct usage is suggested only if you are integrating it in larger frameworks that already implement, possibly in different ways, the integration code provided by the Context (e.g. binding it together with the I/O for the formats or internal queues for the codecs).

Higher-level

They provide higher level Contexts to playback or encode data through a simplified interface:

  • av-player reads bytes though a provided Read and outputs decoded Frames. Under the hood it probes the data, allocates and configures a Demuxer and a Decoder for each stream of interest.
  • av-encoder consumes Frames and outputs encoded and muxed data through a Write output. It automatically setup the encoders and the muxer.

Meta-crates

They ease the use in bulk of everything provided by rust-av.

There are 4 crates providing a list of specific components: av-demuxers, av-muxers, av-decoders and av-encoders; and 2 grouping them by type: av-formats and av-codecs.

Their use is suggested when you’d like to support every format and codec available.

So far

All the development happens on the github organization and so far the initial Core and Essential crates are ready to be used.

There is a nom-based matroska demuxer in working condition and some non-native wrappers providing implementations for some decoders and encoders.

Thanks to est31 we have native vorbis support.

I’m working on a native implementation of opus and soon I’ll move to a video codec.

There is a tiny player called avp and an encoder tool (named ave) will appear once the matroska muxer is complete.

What’s missing in rust-av

API-wise, right now rust-av provides only for simple decode and encoding, muxing and demuxing. There are already enough wrapped codecs to let people play with the library and, hopefully, help in polishing it.

For each crate I’m trying to prepare some easy tasks so people willing to contribute to the project can start from them, all help is welcome!

What’s missing in rust

So far my experience with rust had been quite positive, but there are a number of features that are missing or that could be addressed.

  • SIMD support is shaping up nicely and it is coming soon.
  • The natural fallback, going down to assembly, is available since rust supports the C ABI, inline assembly support on the other hand seems that is still pending some discussion before it reaches stable.
  • Arbitrarily aligned allocation is a MUST in order to support hardware acceleration and SIMD works usually better with aligned buffers.
  • I’d love to have const generics now, luckily associated constants with traits allow some workarounds that let you specialize by constants (and result in neat speedups).
  • I think that focusing a little more on array/slice support would lead to the best gains, since right now there isn’t an equivalent to collect() to fill arrays in an idiomatic way and in multimedia large lookup tables are pretty much a staple.

In closing

Rust and Multimedia seem a really good match, in my experience beside a number of missing features the language seems quite good for the purpose.

Once I have more native implementations complete I will be able to have better means to evaluate the speed difference from writing the same code in C.