Gentoo Logo
Gentoo Logo Side
Gentoo Spaceship

Contributors:
. Aaron W. Swenson
. Agostino Sarubbo
. Alexey Shvetsov
. Alexis Ballier
. Alexys Jacob
. Alice Ferrazzi
. Alice Ferrazzi
. Andreas K. Hüttel
. Anthony Basile
. Arun Raghavan
. Bernard Cafarelli
. Brian Harring
. Christian Ruppert
. Chí-Thanh Christopher Nguyễn
. Denis Dupeyron
. Detlev Casanova
. Diego E. Pettenò
. Domen Kožar
. Doug Goldstein
. Eray Aslan
. Fabio Erculiani
. Gentoo Haskell Herd
. Gentoo Miniconf 2016
. Gentoo Monthly Newsletter
. Gentoo News
. Gilles Dartiguelongue
. Greg KH
. Göktürk Yüksek
. Hanno Böck
. Hans de Graaff
. Ian Whyman
. Jan Kundrát
. Jason A. Donenfeld
. Jeffrey Gardner
. Joachim Bartosik
. Johannes Huber
. Jonathan Callen
. Jorge Manuel B. S. Vicetto
. Kristian Fiskerstrand
. Lance Albertson
. Liam McLoughlin
. Luca Barbato
. Marek Szuba
. Mart Raudsepp
. Matt Turner
. Matthew Thode
. Michael Palimaka
. Michal Hrusecky
. Michał Górny
. Mike Doty
. Mike Gilbert
. Mike Pagano
. Nathan Zachary
. Pacho Ramos
. Patrick Kursawe
. Patrick Lauer
. Patrick McLean
. Paweł Hajdan, Jr.
. Piotr Jaroszyński
. Rafael G. Martins
. Remi Cardona
. Richard Freeman
. Robin Johnson
. Ryan Hill
. Sean Amoss
. Sebastian Pipping
. Sergei Trofimovich
. Steev Klimaszewski
. Stratos Psomadakis
. Sven Vermeulen
. Sven Wegener
. Tom Wijsman
. Yury German
. Zack Medico

Last updated:
February 23, 2018, 18:06 UTC

Disclaimer:
Views expressed in the content published here do not necessarily represent the views of Gentoo Linux or the Gentoo Foundation.


Bugs? Comments? Suggestions? Contact us!

Powered by:
Planet Venus

Welcome to Gentoo Universe, an aggregation of weblog articles on all topics written by Gentoo developers. For a more refined aggregation of Gentoo-related topics only, you might be interested in Planet Gentoo.

February 23, 2018
Michal Hrusecky a.k.a. miska (homepage, bugs)
Honeypot as a service (February 23, 2018, 15:00 UTC)

HaaS logoI’m currently working at CZ.NIC, Czech domain registry on project Turris which are awesome open source WiFI (or WiFi free) routers. For those we developed quite some interesting features. One of them is honeypot that you don’t run on your own hardware (what if somebody managed to escape) but you basically do man in the middle on the attacker and forward him to the honeypot we are running behind many firewalls. We had this option for quite some time on our routers. But because plenty of people around the world found the idea really interesting and wanted to join, this part of our project got separated, has its own team of developers and maintainers and you can now join with your own server as well! And to make it super easy, packages are available in Tumbleweed already and also in security repo where they are being build for Leap as well.

How to get started, how it works and what will you get when you join? First step is register on HaaS website. You can also find there explanation what HaaS actually is. When you log in, you can create a new computer and generate a token for it. Once you have a token, it’s time to setup software on your server.

Second step would be obviously to install the software. Given you are using the cool Linux distribution openSUSE Tumbleweed it is pretty easy. Just zypper in haas-proxy.

Last step is configuration. You need to either disable or mive to different port your real ssh. You can do so easily in /etc/ssh/sshd_config, look for Port option and change it from 22 to some other fancy number. Don’t forget to open that port on firewall as well. After calling systemctl restart sshd you should be able to ssh on new port and your port 22 should be free.

Now do you still remember the token you generated on HaaS website? You need to enter it into /etc/haas-proxy, option TOKEN. And that’s all, call systemctl enable haas-proxy and systemctl start haas-proxy and the trap is set and all you need to do is wait for your victims to fall in.

Once they do (if you have public ipv4 than you should have plenty after just a day), you can go to HaaS website again and browse through the logs of trapped visitors or even view some statistics like which country attacks you the most!

HaaS mapSo enjoy the hunt and let’s trap a lot of bad guys 🙂 btw. Anonymized data from those honeypot sessions are later available to download and CZ.NIC has some security researchers from CSIRT team working on those, so you are having fun, don’t compromise your own security and helping the world at once! So win,win,win situation 🙂

February 21, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

It feels like most of what I end up writing nowadays is my misadventures across a wide range of financial service companies. But here we go (I promise I’ll go back writing about reverse engineering Really Soon Now™).

The last post on this topic was my rant, about how Fineco lacks some basic tools to be used as sole, or primary bank account in the UK. Hopefully they will address this soon, and a sane bank will be available in this country, but for now I had to find alternatives.

Since the various Fintech companies also don’t provide the features I needed, I found myself having to find a “high street bank”. And since my experience up to this point both with Barclays and NatWest was not particularly positive, I decided to look for a different option. Since I have been a mostly-happy customer of Tesco Bank for nearly four years, I decided to give their UK service a try.

At first it appeared to have an online sign-up flow that looked sweet for this kind of problem… except at the end of it, they told me to wait for them to ask me for paperwork to send them through. Turns out the request was for proof of identity (which needs to be certified) and proof of address (which needs to be in original) — the letter and form I could swear is the same that they sent me when I applied for the Irish credit card, except the information is now correct (in Ireland, the Garda will not certify a passport copy, though it appears the UK police forces would).

Let’s ignore the fact that by mailing me at that address, Tesco Bank provided their own proof of address, and let’s focus instead on the fact that they do not accept online print outs, despite almost every service (and, as I found out now, themselves) defaulting to paperless bills and statements. I actually have had a number of bills being mailed to me, including from Hounslow Council, so I have a wide range of choices of what to provide them, but as it turns out, I like a challenge and having some fun with corner cases (particularly as I already solved the immediate need for a bank account by the time I looked into this, but that’s a story for another day).

Here is a part of the story I have not told yet. When I moved to the UK I expected to have to close every account I had still in Ireland, both because Ulster Bank Private is a bloody expensive service, and because at least in Italy I was told I was not entitled to keep credit cards open after I left the country. So as soon as I was in working order over here, I switched over all the billings to Revolut. Unfortunately I couldn’t do that for at least three services (Online.net, Vodafone Italy and Wind/3 Italy) — in two cases because they insist they do not accept anything but Italian cards, while somehow still accepting Tesco Ireland cards.

While trying to figure out an ad-interim solution I got to find out that Tesco Bank has no problem with me still having the “Irish” credit card, and they even allowed me to change the address (and phone number) on file to my new London one. We had some snag regarding the SEPA direct debit, but once I pointed out that they were suggesting breaching the SEPA directives, all was good and indeed the card is debited to the EUR Fineco account.

This also means i get that card’s statements to my London address. So of course I ended up sending, to Tesco Bank, as proof of address… a Tesco Bank Ireland credit card statement. As a way of saying “Do you feel silly enough, now?” to whoever had to manually verify my address and send the paperwork back to me. Turns out it worked just fine, and I got not even a passive aggressive note about it.

Now let’s put aside the registration and let’s take a look at the services provided. Because if I have to rant, I would like at least to rant with some information to others to make up their own mind.

First off, as I said, the first part of the registration is online, after which they get in touch with you to send them the proofs they need. It’s very nice that during the whole time, they “keep in touch” by SMS: they remind you to send the paperwork back, they tell you that the account was open before you receive the snail mail, and so on.

I got a lot of correspondence from Tesco Bank: in addition to the request of proofs, and the proofs being mailed back, I received a notification about the account being opened, the debit card PIN, and a “temporary access number” to sign up online. The debit card arrived separately and through a signature-required delivery. This is a first for me in the UK, as most other cards just got sent through normal mail — except for Fineco, as they used Fedex, and they let me receive it directly at the office, despite it not being the proof of address I sent them.

Once signing up for the online banking, they ask you for an 8-digits security code, a long(er) password, and a selection of verbal question/answers, that are the usual terrible security (so as usual I’ve answered them at random and noted down what I told them the answers were). They allow you to choose your username, but they suggest it to stay the email address on file.

The login for the first time from a different computer is extremely awkward: it starts with two digits of the security code, followed by a SMS second factor authentication, followed by the password (not a subset thereof, so you can use a password manager easily for this one), all through different forms. The same happens for the Mobile Banking application (which is at least linked directly from their website, and very easy to install). The mobile banking login appears to work fairly reliably (and you’ll see on the next post why I call this out explicitly).

I set up the rent standing order on this account, and it was a straightforward and painless process, which is the same as a one-time transaction, except for saying “I want to repeat this every month” checkbox. All in all, it looks to me like it’s a saner UI than Barclays, and proper enough for the needs I have. I will report back if there is anything particularly different from this that I find over time, of course.

Arun Raghavan a.k.a. ford_prefect (homepage, bugs)
Applicative Functors for Fun and Parsing (February 21, 2018, 07:24 UTC)

PSA: This post has a bunch of Haskell code, but I’m going to try to make it more broadly accessible. Let’s see how that goes.

I’ve been proceeding apace with my 3rd year in Abhinav’s Haskell classes at Nilenso, and we just got done with the section on Applicative Functors. I’m at that point when I finally “get” it, so I thought I’d document the process, and maybe capture my a-ha moment of Applicatives.

I should point out that the ideas and approach in this post are all based on Abhinav’s class material (and I’ve found them really effective in understanding the underlying concepts). Many thanks are due to him, and any lack of clarity you find ahead is in my own understanding.

Functors and Applicatives

Functors represent a type or a context on which we can meaningfully apply (map) a function. The Functor typeclass is pretty straightforward:

class Functor f where
  fmap :: (a -> b) -> f a -> f b

Easy enough. fmap takes a function that transforms something of type a to type b and a value of type a in a context f. It produces a value of type b in the same context.

The Applicative typeclass adds two things to Functor. Firstly, it gives us a means of putting things inside a context (also called lifting). The second is to apply a function within a context.

class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

We can see pure lifts a given value into a context. The apply function (<*>) intuitively looks like fmap, with the difference that the function is within a context. This becomes key when we remember that Haskell functions are curried (and can thus be partially applied). This would then allow us to write something like:

maybeAdd :: Maybe Int -> Maybe Int -> Maybe Int
maybeAdd ma mb = pure (+) <*> ma <*> mb

This function takes two numbers in the Maybe context (that is, they either exist, or are Nothing), and adds them. The result will be the sum if both numbers exist, or Nothing if either or both do not.

Go ahead and convince yourself that it is painful to express this generically with just fmap.

Parsers

There are many ways of looking at what a parser is. Let’s work with one definition: A parser,

  • Takes some input
  • Converts some or all of it into something else if it can
  • Returns whatever input was not used in the conversion

How do we represent something that converts something to something else? It’s a function, of course. Let’s write that down as a type:

newtype Parser i o = Parser (i -> (Maybe o, i))

This more or less directly maps to what we just said. A Parser is a data type which has two type parameters — an input type and an output type. It contains a function that takes one argument of the input type, and produces a tuple of Maybe the output type (signifying if parsing succeeded) and the rest of the input.

We can name the field runParser, so it becomes easier to get a hold of the function inside our Parser type:

newtype Parser i o = Parser { runParser :: i -> (Maybe o, i) }

Parser combinators

The “rest” part is important for the reason that we would like to be able to chain small parsers together to make bigger parsers. We do this using “parser combinators” — functions that take one or more parsers and return a more complex parser formed by combining them in some way. We’ll see some of those ways as we go along.

Parser instances

Before we proceed, let’s define Functor and Applicative instances for our Parser type.

instance Functor (Parser i) where
  fmap f p = Parser $ \input ->
    let (mo, i) = runParser p input
    in (f <$> mo, i)

The intuition here is clear — if I have a parser that takes some input and provides some output, fmaping a function on that parser translates to applying that function on the output of the parser.

instance Applicative (Parser i) where
  pure x = Parser $ \input -> (Just x, input)

  pf <*> po = Parser $ \input ->
    case runParser pf input of
         (Just f, rest) -> case runParser po rest of
                                (Just o, rest') -> (Just (f o), rest')
                                (Nothing, _)    -> (Nothing, input)
         (Nothing, _)   -> (Nothing, input)

The Applicative instance is a bit more involved than Functor. What we’re doing first is “running” the first parser which gives us the function we want to apply (remember that this is a curried function, so rather than parsing out a function, we are most likely parsing out a value and creating a function with that). If we succeed, then we run the second parser to get a value to apply the function to. If this is also successful, we apply the function to the value, and return the result within the parser context (i.e. the result, and the rest of the input).

Implementing some parsers

Now let’s take our new data type and instances for a spin. Before we write a real parser, let’s write a helper function. A common theme while parsing a string is to match a single character on a predicate — for example, “is this character an alphabet”, or “is this character a semi-colon”. We write a function to take a predicate and return the corresponding parser:

satisfy :: (Char -> Bool) -> Parser String Char
satisfy p = Parser $ \input ->
  case input of
       (c:cs) | p c -> (Just c, cs)
       _            -> (Nothing, input)

Now let’s try to make a parser that takes a string, and if it finds a ASCII digit character, provides the corresponding integer value. We have a function from the Data.Char module to match ASCII digit characters — isDigit. We also have a function to take a digit character and give us an integer — digitToInt. Putting this together with satisfy above.

import Data.Char (digitToInt, isDigit)

digit :: Parser String Int
digit = digitToInt <$> satisfy isDigit

And that’s it! Note how we used our higher-order satisfy function to match a ASCII digit character and the Functor instance to apply digitToInt to the result of that parser (reminder: <$> is just the infix form of writing fmap — this is the same as fmap digitToInt (satisfy digit).

Another example — a character parser, which succeeds if the next character in the input is a specific character we choose.

char :: Char -> Parser String Char
char x = satisfy (x ==)

Once again, the satisfy function makes this a breeze. I must say I’m pleased with the conciseness of this.

Finally, let’s combine character parsers to create a word parser — a parser that succeeds if the input is a given word.

word :: String -> Parser String String
word ""     = Parser $ \input -> (Just "", input)
word (c:cs) = (:) <$> char c <*> word cs

A match on an empty word always succeeds. For anything else, we just break down the parser to a character parser of the first character and a recursive call to the word parser for the rest. Again, note the use of the Functor and Applicative instance. Let’s look at the type signature of the (:) (list cons) function, which prepends an element to a list:

(:) :: a -> [a] -> [a]

The function takes two arguments — a single element of type a, and a list of elements of type a. If we expand the types some more, we’ll see that the first argument we give it is a Parser String Char and the second is a Parser String [Char] (String is just an alias for [Char]).

In this way we are able to take the basic list prepend function and use it to construct a list of characters within the Parser context. (a-ha!?)

JSON

JSON is a relatively simple format to parse, and makes for a good example for building a parser. The JSON website has a couple of good depictions of the JSON language grammar front and center.

So that defines our parser problem then — we want to read a string input, and convert it into some sort of in-memory representation of the JSON value. Let’s see what that would look like in Haskell.

data JsonValue = JsonString String
               | JsonNumber JsonNum
               | JsonObject [(String, JsonValue)]
               | JsonArray [JsonValue]
               | JsonBool Bool
               | JsonNull

-- We represent a number as an infinite precision
-- floating point number with a base 10 exponent
data JsonNum = JsonNum { negative :: Bool
                       , signif   :: Integer
                       , expo     :: Integer
                       }

The JSON specification does not really tell us what type to use for numbers. We could just use a Double, but to make things interesting, we represent it as an arbitrary precision floating point number.

Note that the JsonArray and JsonObject constructors are recursive, as they should be — a JSON array is an array of JSON values, and a JSON object is a mapping from string keys to JSON values.

Parsing JSON

We now have the pieces we need to start parsing JSON. Let’s start with the easy bits.

null

To parse a null we literally just look for the word “null”.

jsonNull :: Parser String JsonValue
jsonNull = word "null" $> JsonNull

The $> operator is a flipped shortcut for fmap . const — it evaluates the argument on the left, and then fmaps the argument on the right onto it. If the word "null" parser is successful (Just "null"), we’ll fmap the JsonValue representing null to replace the string "null" (i.e. we’ll get a (Just JsonNull, <rest of the input>)).

true and false

First a quick detour:

instance Alternative (Parser i) where
  empty = Parser $ \input -> (Nothing, input)
  p1 <|> p2 = Parser $ \input ->
      case runParser p1 input of
           (Nothing, _) -> case runParser p2 input of
                                (Nothing, _) -> (Nothing, input)
                                justValue    -> justValue
           justValue    -> justValue

The Alternative instance is easy to follow once you understand Applicative. We define an empty parser that matches nothing. Then we define the alternative operator (<|>) as we might intuitively imagine.

We run the parser given as the first argument first, if it succeeds we are done. If it fails, we run the second parser on the whole input again, if it succeeds, we return that value. If both fail, we return Nothing.

Parsing true and false with this in our belt looks like:

jsonBool :: Parser String JsonValue
jsonBool =  (word "true" $> JsonBool True)
        <|> (word "false" $> JsonBool False)

We are easily able express the idea of trying to parse for the string “true”, and if that fails, trying again for the string “false”. If either matches, we have a boolean value, if not, Nothing. Again, nice and concise.

String

This is only slightly more complex. We need a couple of helper functions first:

hexDigit :: Parser String Int
hexDigit = digitToInt <$> satisfy isHexDigit

digitsToNumber :: Int -> [Int] -> Integer
digitsToNumber base digits = foldl (\num d -> num * fromIntegral base + fromIntegral d) 0 digits

hexDigit is easy to follow. It just matches anything from 0-9 and a-f or A-F.

digitsToNumber is a pure function that takes a list of digits, and interprets it as a number in the given base. We do some jumping through hoops with fromIntegral to take Int digits (mapping to a normal word-sized integer) and produce an Integer (arbitrary sized integer).

Now follow along one line at a time:

jsonString :: Parser String String
jsonString = (char '"' *> many jsonChar <* char '"')
  where
    jsonChar =  satisfy (\c -> not (c == '\"' || c == '\\' || isControl c))
            <|> word "\\\"" $> '"'
            <|> word "\\\\" $> '\\'
            <|> word "\\/"  $> '/'
            <|> word "\\b"  $> '\b'
            <|> word "\\f"  $> '\f'
            <|> word "\\n"  $> '\n'
            <|> word "\\r"  $> '\r'
            <|> word "\\t"  $> '\t'
            <|> chr . fromIntegral . digitsToNumber 16 <$> (word "\\u" *> replicateM 4 hexDigit)

A string is a valid JSON character, surrounded by quotes. The *> and <* operators allow us to chain parsers whose output we wish to discard (since the quotes are not part of the actual string itself). The many function comes from the Alternative typeclass. It represents zero or more instances of context. In our case, it tries to match zero or more jsonChar parsers.

So what does jsonChar do? Following the definition of a character in the JSON spec, first we try to match something that is not a quote ("), a backslash (\) or a control character. If that doesn’t match, we try to match the various escape characters that the specification mentions.

Finally, if we get a \u followed by 4 hexadecimal characters, we put them in a list (replicateM 4 hexDigit chains 4 hexDigit parsers and provides the output as a list), convert that list into a base 16 integer (digitsToNumber), and then convert that to a Unicode character (chr).

The order of chaining these parsers does matter for performance. The first parser in our <|> chain is the one that is most likely (most characters are not escaped). This follows from our definition of the Alternative instance. We run the first parser, then the second, and so on. We want this to succeed as early as possible so we don’t run more parsers than necessary.

Arrays

Arrays and objects have something in common — they have items which are separated by some value (commas for array values, commas for each key-value pair in an object, and colons separating keys and values). Let’s just factor this commonality out:

sepBy :: Parser i v -> Parser i s -> Parser i [v]
sepBy v s = (:) <$> v <*> many (s *> v) 
         <|> pure []

We take a parser for our values (v), and a parser for our separator (s). We try to parse one or more v separated by s, and or just return an empty list in the parser context if there are none.

Now we write our JSON array parser as:

jsonArray :: Parser String JsonValue
jsonArray = JsonArray <$> (char '[' *> (json `sepBy` char ',') <* char ']')

Nice, that’s really succinct. But wait! What is json?

Putting it all together

We know that arrays contain JSON values. And we know how to parse some JSON values. Let’s try to put those together for our recursive definition:

json :: Parser String JsonValue
json =  jsonNull
    <|> jsonBool
    <|> jsonString
    <|> jsonArray
--  <|> jsonNumber
--  <|> jsonObject

And that’s it!

The JSON object and number parsers follow the same pattern. So far we’ve ignored spaces in the input, but those can be consumed and ignored easily enough based on what we’ve learned.

You can find the complete code for this exercise on Github.

Some examples of what this looks like in the REPL:

*Json> runParser json "null"
(Just null,"")

*Json> runParser json "true"
(Just true,"")

*Json> runParser json "[null,true,\"hello!\"]"
(Just [null, true, "hello!" ],"")

Concluding thoughts

If you’ve made it this far, thank you! I realise this is long and somewhat dense, but I am very excited by how elegantly Haskell allows us to express these ideas, using fundamental aspects of its type(class) system.

A nice real world example of how you might use this is the optparse-applicative package which uses these ideas to greatly simplify the otherwise dreary task of parsing command line arguments.

I hope this post generates at least some of the excitement in you that it has in me. Feel free to leave your comments and thoughts below.

February 19, 2018
Jason A. Donenfeld a.k.a. zx2c4 (homepage, bugs)
WireGuard in Google Summer of Code (February 19, 2018, 14:55 UTC)

WireGuard is participating in Google Summer of Code 2018. If you're a student — bachelors, masters, PhD, or otherwise — who would like to be funded this summer for writing interesting kernel code, studying cryptography, building networks, making mobile apps, contributing to the larger open source ecosystem, doing web development, writing documentation, or working on a wide variety of interesting problems, then this may be appealing. You'll be mentored by world-class experts, and the summer will certainly boost your skills. Details are on this page — simply contact the WireGuard team to get a proposal into the pipeline.

Alice Ferrazzi a.k.a. alicef (homepage, bugs)

Gentoo has been accepted as a Google Summer of Code 2018 mentoring organization

Gentoo has been accepted into the Google Summer of Code 2018!

Contacts:
If you want to help in any way with Gentoo GSoC 2018 please add yourself here [1] as soon as possible, is important!. Someone from the Gentoo GSoC team will contact you back, as soon as possible. We are always searching for people willing to help with Gentoo GSoC 2018!

Students:
If you are a student and want to spend your summer on Gentoo projects and having fun writing code, you can start discussing ideas in the #gentoo-soc IRC Freenode channel [2].
You can find some ideas example here [3], but of course could be also something different.
Just remember that we strongly recommend that you work with a potential mentor to develop your idea before proposing it formally. Don’t waste any time because there’s typically some polishing which needs to occur before the deadline (March 28th)

I can assure you that Gentoo GSoC is really fun and a great experience for improving yourself. It is also useful for making yourself known in the Gentoo community. Contributing to the Gentoo GSoC will be contributing to Gentoo, a bleeding edge distribution, considered by many to be for experts and with near-unlimited adaptability.

If you require any further information, please do not hesitate to contact us [4] or check the Gentoo GSoC 2018 wiki page [5].

[1] https://wiki.gentoo.org/wiki/Google_Summer_of_Code/2018/Mentors
[2] http://webchat.freenode.net/?channels=gentoo-soc
[3] https://wiki.gentoo.org/wiki/Google_Summer_of_Code/2018/Ideas
[4] soc-mentors@gentoo.org
[5] https://wiki.gentoo.org/wiki/Google_Summer_of_Code/2018

If you are a student and want to spend your summer on Gentoo projects and having fun writing code, you can start discussing ideas in the #gentoo-soc IRC Freenode channel [1].
You can find some ideas example here [2], but of course could be also something different.
Just remember that we strongly recommend that you work with a potential mentor to develop your idea before proposing it formally. Don’t waste any time because there’s typically some polishing which needs to occur before the deadline (March 28th)

I can assure you that Gentoo GSoC is really fun and a great experience for improving yourself. It is also useful for making yourself known in the Gentoo community. Contributing to the Gentoo GSoC will be contributing to Gentoo, a bleeding edge distribution, considered by many to be for experts and with near-unlimited adaptability.

If you require any further information, please do not hesitate to contact us [3] or check the Gentoo GSoC 2018 wiki page [4].

[1] http://webchat.freenode.net/?channels=gentoo-soc
[2] https://wiki.gentoo.org/wiki/Google_Summer_of_Code/2018/Ideas
[3] soc-mentors@gentoo.org
[4] https://wiki.gentoo.org/wiki/Google_Summer_of_Code/2018

February 14, 2018
Luca Barbato a.k.a. lu_zero (homepage, bugs)
Rust-av: Rust and Multimedia (February 14, 2018, 19:44 UTC)

Recently I presented my new project at Fosdem and since I was afraid of not having enough time for the questions I trimmed the content to the bare minimum. This blog post should add some more details.

What is it?

Rust-av aims to be a complete multimedia toolkit written in rust.

Rust is a quite promising language that aims to offer high execution speed while granting a number of warranties on the code behavior that you cannot have in C, C++, Java and so on.

Its zero-cost abstraction feature coupled with the fact that the compiler actively prevents you from committing a large class of mistakes related to memory access seems a perfect match to implement a multimedia toolkit that is easy to use, fast enough and trustworthy.

Why something completely new?

Since rust code can be intermixed with C code, an evolutive approach of replacing little by little small components in a larger project is perfectly feasible, and it is what we are currently trying to do with vlc.

But rust is not just good to write some inner routines so they are fast and correct, its trait system is also quite useful to have a more expressive API.

Most of the multimedia concepts are pretty simple at the high level (e.g frame is just a picture or some sound with some timestamp) with an excruciating amount of quirk and details that require your toolkit to make choices for you or make you face a large amount of complexity.

That leads to API that are either easy but quite inflexible (and opinionated) or API providing all the flexibility, but forcing the user to have to learn a lot of information in order to achieve what the simpler API would let you implement in an handful of lines of code.

I wanted to leverage Rust to make the low level implementations with less bugs and, at the same time, try to provide a better API to use it.

Why now?

Since 2016 I kept bouncing ideas with Kostya and Geoffroy but between my work duties and other projects I couldn’t devote enough time to it. Thanks to the Mozilla Open Source Support initiative that awarded me with enough to develop it full time, now the project has already some components published and more will follow during the next months.

Philosophy

I’m trying to leverage the experience I have from contributing to vlc and libav and keep what is working well and try to not make the same mistakes.

Ease of use

I want that the whole toolkit to be useful to a wide audience. Developers often fight against the library in order to undo what is happening under the hood or end up vendoring some part of it since they need only a tiny subset of all the features that are provided.

Rust makes quite natural split large projects in independent components (called crates) and it is already quite common to have meta-crates re-exporting many smaller crates to provide some uniform access.

The rust-av code, as opposed to the rather monolithic approach taken in Libav, can be reused with the granularity of the bare codec or format:

  • Integrating it in a foreign toolkit won’t require to undo what the common utility code does.
  • Even when using it through the higher level layers, rust-av won’t force the developer to bring in any unrelated dependencies.
  • On the other hand users that enjoy a fully integrated and all-encompassing solution can simply depend on the meta-crates and get the support for everything.

Speed

Multimedia playback boils down to efficiently do complex computation so an arbitrary large amount of data can be rendered within a fraction of second, multimedia real time streaming requires to compress an equally large amount of data in the same time.

Speed in multimedia is important.

Rust provides high level idiomatic constructs that surprisingly lead to pretty decent runtime speed. The stdsimd effort and the seamless C ABI support make easier to leverage the SIMD instructions provided by the recent CPU architectures.

Trustworthy

Traditionally the most effective way to write fast multimedia code had been pairing C and assembly. Sadly the combination makes quite easy to overlook corner cases and have any kind of memory hazards (use-after-free, out of bound reads and writes, NULL-dereferences…).

Rust effectively prevents a good deal of those issues at compile time. Since its abstractions usually do not cause slowdowns it is possible to write code that is, arguably, less misleading and as fast.

Structure

The toolkit is composed of multiple, loosely coupled, crates. They can be grouped by level of abstraction.

Essential

av-data: Used by nearly all the other crates it provides basic data types and a minimal amount of functionality on top of it. It provides the following structs mainly:

  • Frame: it binds together a time reference and a buffer, representing either a video picture or some audio samples.
  • Packet: it bind together a time reference and a buffer, containing compressed data.
  • Value: Simple key value type abstraction, used to pass arbitrary data to the configuration functions.

Core

They provide the basic abstraction (traits) implemented by specific set of components.

  • av-format: It provides a set of traits to implement muxers and demuxers and an utility Context to bridge the normal rust I/O Write and Read traits and the actual muxers and demuxers.
  • av-codec: It provides a set of traits to implement encoders and decoders and an utility Context that wraps.

Utility

They provide building blocks that may be used to implement actual codecs and formats.

  • av-bitstream: Utility crate to write and read bits and bytes
  • av-audio: Audio-specific utilities
  • av-video: Video-specific utilities

Implementation

Actual implementations of codec and format, they can be used directly or through the utility Contexts.

The direct usage is suggested only if you are integrating it in larger frameworks that already implement, possibly in different ways, the integration code provided by the Context (e.g. binding it together with the I/O for the formats or internal queues for the codecs).

Higher-level

They provide higher level Contexts to playback or encode data through a simplified interface:

  • av-player reads bytes though a provided Read and outputs decoded Frames. Under the hood it probes the data, allocates and configures a Demuxer and a Decoder for each stream of interest.
  • av-encoder consumes Frames and outputs encoded and muxed data through a Write output. It automatically setup the encoders and the muxer.

Meta-crates

They ease the use in bulk of everything provided by rust-av.

There are 4 crates providing a list of specific components: av-demuxers, av-muxers, av-decoders and av-encoders; and 2 grouping them by type: av-formats and av-codecs.

Their use is suggested when you’d like to support every format and codec available.

So far

All the development happens on the github organization and so far the initial Core and Essential crates are ready to be used.

There is a nom-based matroska demuxer in working condition and some non-native wrappers providing implementations for some decoders and encoders.

Thanks to est31 we have native vorbis support.

I’m working on a native implementation of opus and soon I’ll move to a video codec.

There is a tiny player called avp and an encoder tool (named ave) will appear once the matroska muxer is complete.

What’s missing in rust-av

API-wise, right now rust-av provides only for simple decode and encoding, muxing and demuxing. There are already enough wrapped codecs to let people play with the library and, hopefully, help in polishing it.

For each crate I’m trying to prepare some easy tasks so people willing to contribute to the project can start from them, all help is welcome!

What’s missing in rust

So far my experience with rust had been quite positive, but there are a number of features that are missing or that could be addressed.

  • SIMD support is shaping up nicely and it is coming soon.
  • The natural fallback, going down to assembly, is available since rust supports the C ABI, inline assembly support on the other hand seems that is still pending some discussion before it reaches stable.
  • Arbitrarily aligned allocation is a MUST in order to support hardware acceleration and SIMD works usually better with aligned buffers.
  • I’d love to have const generics now, luckily associated constants with traits allow some workarounds that let you specialize by constants (and result in neat speedups).
  • I think that focusing a little more on array/slice support would lead to the best gains, since right now there isn’t an equivalent to collect() to fill arrays in an idiomatic way and in multimedia large lookup tables are pretty much a staple.

In closing

Rust and Multimedia seem a really good match, in my experience beside a number of missing features the language seems quite good for the purpose.

Once I have more native implementations complete I will be able to have a better mean to evaluate the speed difference from writing the same code in C.

Sebastian Pipping a.k.a. sping (homepage, bugs)
I love free software… and Gentoo does! #ilovefs (February 14, 2018, 14:25 UTC)

Some people care if software is free of cost or if it has the best features, above everything else. I don’t. I care that I can legally inspect its inner workings, modify and share modified versions. That’s why I happily avoid macOS, Windows, Skype, Photoshop.

I ran into these two pieces involving Gentoo in the Gallery of Free Software lovers and would like to share them with you:

Images are licensed under CC BY-SA 4.0 (with attribution going to Free Software Foundation Europe) as confirmed by Max Mehl.

February 12, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

I don’t really like the idea of having to write about proprietary software here, but I only found terrible alternative suggestions on the eb so I thought I would at least try to write down about it in the hope to avoid people falling for very bad advice.

The problem: after updating my BIOS, BitLocker asks for the key at boot, and PIN login for Windows 10 (Microsoft Account) fails, requiring to log in with the full Microsoft account password. Trying to re-enable the PIN fails with the error message “Sorry, there was a problem signing you in”.

The amount of misleading information I found on the Internet is astonishing, including a phrase from what appeared to be a Microsoft support person, stating «The operating system should always go with the BIOS. If the BIOS is freshly updated, then the OS has to be fresh as well.» Facepalms all over the place.

The solution (for me): go back in the BIOS and re-enable the TPM (“Security Module”).

Some background is needed. The 2017 Gamestation I’m using nowadays is built using a MSI X299 SLI PLUS with a plug-in TPM which is a requirement to use BitLocker (and if you think that makes it very safe, think again).

I had just updated the firmware of the motherboard (that at this point we all still call “BIOS” despite being clearly “UEFI” based), and it turns out that MSI just silently drop most of the customization to the BIOS settings after update. In particular this disabled a few required settings, including the TPM itself (and Secure Boot — I wonder if Matthew Garrett would have some words about the implementation of it in this particular board at this point).

I see reports on this for MSI and Gigabyte boards alike, so I can assume that Gigabyte does the same, and requires re-enabling the TPM in the settings when updating the BIOS version.

I would probably say that the MSI firmware engineering does not quite fully convince me. It’s not just the dropping all the settings on update (which I still find is a bad thing to do for users), but also the fact that one of the settings is explicitly marked as “crypto miner mode” — I’m sure looking forward for the time after the bubble bursts so that we don’t have to pay premium for decent hardware just because someone thinks they can make money from nothing. Oh well.

February 10, 2018
Matthew Thode a.k.a. prometheanfire (homepage, bugs)
Native ZFS encryption for your rootfs (February 10, 2018, 06:00 UTC)

Disclaimer

I'm not responsible if you ruin your system, this guide functions as documentation for future me. Remember to back up your data.

Why do this instead of luks

I wanted to remove a layer from the File to Disk layering, before it was ZFS -> LUKS -> disk, now it's ZFS -> disk.

Prework

I just got a new laptop and wanted to just migrate the data, luckily the old laptop was using ZFS as well, so the data could be sent/received though native ZFS means.

The actual setup

Set up your root pool with the encryption key, it will be inherited by all child datasets, no child datasets will be allowed to be unencrypted.

In my case the pool name was slaanesh-zp00, so I ran the following to create the fresh pool.

zpool create -O encryption=on -O keyformat=passphrase zfstest /dev/zvol/slaanesh-zp00/zfstest

After that just go on and create your datasets as normal, transfer old data as needed (it'll be encrypted as it's written). See https://wiki.gentoo.org/wiki/ZFS for a good general guide on setting up your datasets.

decrypting at boot

If you are using dracut it should just work. No changes to what you pass on the kernel command line are needed. The code is upstream in https://github.com/zfsonlinux/zfs/blob/master/contrib/dracut/90zfs/zfs-load-key.sh.in

notes

Make sure you install from git master, there was a disk format change for encrypted datasets that just went in a week or so ago.

Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
UK Banking, Fineco is not enough (February 10, 2018, 00:03 UTC)

You may remember that the last time I blogged about UK banking I had just dismissed Barclays in favour of Fineco, the Italian investment bank, branch of UniCredit, This seemed a good move, both because people spoke very good of Fineco in my circles, at least in Italy, and because the sign up flow seemed so good that it sounded like a good idea.

I found out almost right away that something was not quite perfect for the UK market, in particular because there was (and is) no way to set up a standing order, which is the standard way to pay for your rent in the UK (and Ireland, for what it’s worth). But it seemed a minor thing to worry about, as the rest of the features of the bank (ability to spend up to £10k in a single transaction by requesting an explicit lift on the card limits with SMS authentication, just to say one).

Unfortunately, a couple of months later I know for sure it is not possible to use Fineco as a primary account in the UK at all. There are two problems, the first being very much a problem to anyone, and the second being a problem for my situation. I’ll start with the first one: direct debit support.

The direct debit system, for those not used to it in Europe, is one where you give a “debtor” (usually, an utility service, such as your ISP or power company) your account details (Sort Code and Account Number in the case of the UK), and they will tell the bank to give them money at certain time of the month. And it is the way Jeremy Clarkson lost £200, ten years ago. There is a nearly identical system in the rest of the EU, called SEPA Direct Debit (with SDD Core being the more commonly known about, as it deals with B2C, business-to-consumer) debits.

After I opened the Fineco account, I checked on Payments UK’s Sort Code Checker which features were enabled for it (30-02-48) and then, as well as the time of writing, it says «Bacs Direct Debits can be set up on this sort code.» So I had no refrain in closing my Barclays account and moving all the money into the newly created account. All of my utilities were more than happy to do so, except for ThamesWater that refused to let me set up the debit online. Turns out they were the only ones with a clue.

Indeed, when in January the first debit was supposed to land, instead of seeing the debit on the statement, I saw a BACS credit of the same amount. I contacted my ISP (Hyperoptic, with the awesome customer support) to verify if something failed on their side, but they didn’t see anything amiss for them. When even Netflix showed up the same way, and both of the transaction showed up an “entry reversal” of the same amount, I knew something was off with the bank and contacted them, originally to no avail.

Indeed, a few more debits showed up the same way, so I have been waiting for the shoe to drop, which it did at the end of January, when Fineco sent me an email (or actually, a ticket, it even has a ticket number!) telling me that they processed the debits as a one-off, but to cancel them because they won’t do this again. This was professional of them, particularly as this way it does not hit my credit score at all, but it still is a major pain in the neck.

My first hope was to be able to just use Revolut to pay the direct debits, since they are all relatively low amounts, which fit my usual auto top-up strategy. When you look at the Revolut information page with your account details for GBP, the text says explicitly «Use this personal UK current account to get salary and to pay bills», which brought me hope, and indeed the Payment UK’s checker also confirmed that it supposedly accepts Bacs Direct Debit. But when I checked with the in-app chat support, I was told that, no Revolut does not support direct debits, which makes that phrase extremely misleading. At least TransferWise explicitly denies supporting Direct Debit in the sort code checker, kudos to them.

The next problem with Fineco is not actually their fault, but is still due to them not having the “full features” of a UK high street bank. I got contacted by Dexters, the real estate company that among other things manages my apartment and collects my rent. While they told me the money arrived all good when I moved to Fineco (I asked them explicitly), they sent me a scary and threatening email (after failing to reach me on the phone, I was going to be charged excessively high roaming charges to answer an unknown number)… because £12 were missing from my payment. The email exchange wasn’t particularly productive (I sent them a photo of the payment confirmation, they told me «[they] received a large sum of money[sic] however it is £12.00 that is outstanding on the account.» So I called them on Monday, and they managed to confirm that it was actually £6 missing in December, and another £6 missing in January.

Throwing this around with colleague, and then discussing with a more reasonable person from Dexters on the phone, we came to figure out that Barclays (as the bank used by Dexters to receive the rent) is charging them £6 to receive these transfers because they are “international” — despite the fact that they are indeed local, it appears Barclays apply that fee for any transfer received over the SWIFT network rather than through the Faster Payments system used by most of the other UK banks. I didn’t want to keep arguing with Dexters over the fact that it’s their bank charging them the fee, I paid the extra £12, and decided to switch the rent payment over to the new account as well. I really dislike Barclays.

I’ll post later this month on the following attempts with other bank accounts. For now I decided that I’ll keep getting my salary into Fineco, and keep a running balance on the “high street” account for the direct debits, and the rent. Right now for my only GBP credit card (American Express) I still pay the balance off Fineco via debit card payment anyway, because the credit limit they gave me is quite limited for my usual spending, particularly now that I can actually charge that card when booking flights on BA without having to spend extra money in fees.

February 06, 2018
Gentoo at FOSDEM 2018 (February 06, 2018, 19:49 UTC)

Gentoo Linux participated with a stand during this year's FOSDEM 2018, as has been the case for the past several years. Three Gentoo developers had talks this year, Haubi was back with a Gentoo-related talk on Unix? Windows? Gentoo! - POSIX? Win32? Native Portability to the max!, dilfridge talked about Perl in the Physics Lab … Continue reading "Gentoo at FOSDEM 2018"

February 05, 2018
Greg KH a.k.a. gregkh (homepage, bugs)
Linux Kernel Release Model (February 05, 2018, 17:13 UTC)

Note

This post is based on a whitepaper I wrote at the beginning of 2016 to be used to help many different companies understand the Linux kernel release model and encourage them to start taking the LTS stable updates more often. I then used it as a basis of a presentation I gave at the Linux Recipes conference in September 2017 which can be seen here.

With the recent craziness of Meltdown and Spectre , I’ve seen lots of things written about how Linux is released and how we handle handles security patches that are totally incorrect, so I figured it is time to dust off the text, update it in a few places, and publish this here for everyone to benefit from.

I would like to thank the reviewers who helped shape the original whitepaper, which has helped many companies understand that they need to stop “cherry picking” random patches into their device kernels. Without their help, this post would be a total mess. All problems and mistakes in here are, of course, all mine. If you notice any, or have any questions about this, please let me know.

Overview

This post describes how the Linux kernel development model works, what a long term supported kernel is, how the kernel developers approach security bugs, and why all systems that use Linux should be using all of the stable releases and not attempting to pick and choose random patches.

Linux Kernel development model

The Linux kernel is the largest collaborative software project ever. In 2017, over 4,300 different developers from over 530 different companies contributed to the project. There were 5 different releases in 2017, with each release containing between 12,000 and 14,500 different changes. On average, 8.5 changes are accepted into the Linux kernel every hour, every hour of the day. A non-scientific study (i.e. Greg’s mailbox) shows that each change needs to be submitted 2-3 times before it is accepted into the kernel source tree due to the rigorous review and testing process that all kernel changes are put through, so the engineering effort happening is much larger than the 8 changes per hour.

At the end of 2017 the size of the Linux kernel was just over 61 thousand files consisting of 25 million lines of code, build scripts, and documentation (kernel release 4.14). The Linux kernel contains the code for all of the different chip architectures and hardware drivers that it supports. Because of this, an individual system only runs a fraction of the whole codebase. An average laptop uses around 2 million lines of kernel code from 5 thousand files to function properly, while the Pixel phone uses 3.2 million lines of kernel code from 6 thousand files due to the increased complexity of a SoC.

Kernel release model

With the release of the 2.6 kernel in December of 2003, the kernel developer community switched from the previous model of having a separate development and stable kernel branch, and moved to a “stable only” branch model. A new release happened every 2 to 3 months, and that release was declared “stable” and recommended for all users to run. This change in development model was due to the very long release cycle prior to the 2.6 kernel (almost 3 years), and the struggle to maintain two different branches of the codebase at the same time.

The numbering of the kernel releases started out being 2.6.x, where x was an incrementing number that changed on every release The value of the number has no meaning, other than it is newer than the previous kernel release. In July 2011, Linus Torvalds changed the version number to 3.x after the 2.6.39 kernel was released. This was done because the higher numbers were starting to cause confusion among users, and because Greg Kroah-Hartman, the stable kernel maintainer, was getting tired of the large numbers and bribed Linus with a fine bottle of Japanese whisky.

The change to the 3.x numbering series did not mean anything other than a change of the major release number, and this happened again in April 2015 with the movement from the 3.19 release to the 4.0 release number. It is not remembered if any whisky exchanged hands when this happened. At the current kernel release rate, the number will change to 5.x sometime in 2018.

Stable kernel releases

The Linux kernel stable release model started in 2005, when the existing development model of the kernel (a new release every 2-3 months) was determined to not be meeting the needs of most users. Users wanted bugfixes that were made during those 2-3 months, and the Linux distributions were getting tired of trying to keep their kernels up to date without any feedback from the kernel community. Trying to keep individual kernels secure and with the latest bugfixes was a large and confusing effort by lots of different individuals.

Because of this, the stable kernel releases were started. These releases are based directly on Linus’s releases, and are released every week or so, depending on various external factors (time of year, available patches, maintainer workload, etc.)

The numbering of the stable releases starts with the number of the kernel release, and an additional number is added to the end of it.

For example, the 4.9 kernel is released by Linus, and then the stable kernel releases based on this kernel are numbered 4.9.1, 4.9.2, 4.9.3, and so on. This sequence is usually shortened with the number “4.9.y” when referring to a stable kernel release tree. Each stable kernel release tree is maintained by a single kernel developer, who is responsible for picking the needed patches for the release, and doing the review/release process. Where these changes are found is described below.

Stable kernels are maintained for as long as the current development cycle is happening. After Linus releases a new kernel, the previous stable kernel release tree is stopped and users must move to the newer released kernel.

Long-Term Stable kernels

After a year of this new stable release process, it was determined that many different users of Linux wanted a kernel to be supported for longer than just a few months. Because of this, the Long Term Supported (LTS) kernel release came about. The first LTS kernel was 2.6.16, released in 2006. Since then, a new LTS kernel has been picked once a year. That kernel will be maintained by the kernel community for at least 2 years. See the next section for how a kernel is chosen to be a LTS release.

Currently the LTS kernels are the 4.4.y, 4.9.y, and 4.14.y releases, and a new kernel is released on average, once a week. Along with these three kernel releases, a few older kernels are still being maintained by some kernel developers at a slower release cycle due to the needs of some users and distributions.

Information about all long-term stable kernels, who is in charge of them, and how long they will be maintained, can be found on the kernel.org release page.

LTS kernel releases average 9-10 patches accepted per day, while the normal stable kernel releases contain 10-15 patches per day. The number of patches fluctuates per release given the current time of the corresponding development kernel release, and other external variables. The older a LTS kernel is, the less patches are applicable to it, because many recent bugfixes are not relevant to older kernels. However, the older a kernel is, the harder it is to backport the changes that are needed to be applied, due to the changes in the codebase. So while there might be a lower number of overall patches being applied, the effort involved in maintaining a LTS kernel is greater than maintaining the normal stable kernel.

Choosing the LTS kernel

The method of picking which kernel the LTS release will be, and who will maintain it, has changed over the years from an semi-random method, to something that is hopefully more reliable.

Originally it was merely based on what kernel the stable maintainer’s employer was using for their product (2.6.16.y and 2.6.27.y) in order to make the effort of maintaining that kernel easier. Other distribution maintainers saw the benefit of this model and got together and colluded to get their companies to all release a product based on the same kernel version without realizing it (2.6.32.y). After that was very successful, and allowed developers to share work across companies, those companies decided to not do that anymore, so future LTS kernels were picked on an individual distribution’s needs and maintained by different developers (3.0.y, 3.2.y, 3.12.y, 3.16.y, and 3.18.y) creating more work and confusion for everyone involved.

This ad-hoc method of catering to only specific Linux distributions was not beneficial to the millions of devices that used Linux in an embedded system and were not based on a traditional Linux distribution. Because of this, Greg Kroah-Hartman decided that the choice of the LTS kernel needed to change to a method in which companies can plan on using the LTS kernel in their products. The rule became “one kernel will be picked each year, and will be maintained for two years.” With that rule, the 3.4.y, 3.10.y, and 3.14.y kernels were picked.

Due to a large number of different LTS kernels being released all in the same year, causing lots of confusion for vendors and users, the rule of no new LTS kernels being based on an individual distribution’s needs was created. This was agreed upon at the annual Linux kernel summit and started with the 4.1.y LTS choice.

During this process, the LTS kernel would only be announced after the release happened, making it hard for companies to plan ahead of time what to use in their new product, causing lots of guessing and misinformation to be spread around. This was done on purpose as previously, when companies and kernel developers knew ahead of time what the next LTS kernel was going to be, they relaxed their normal stringent review process and allowed lots of untested code to be merged (2.6.32.y). The fallout of that mess took many months to unwind and stabilize the kernel to a proper level.

The kernel community discussed this issue at its annual meeting and decided to mark the 4.4.y kernel as a LTS kernel release, much to the surprise of everyone involved, with the goal that the next LTS kernel would be planned ahead of time to be based on the last kernel release of 2016 in order to provide enough time for companies to release products based on it in the next holiday season (2017). This is how the 4.9.y and 4.14.y kernels were picked as the LTS kernel releases.

This process seems to have worked out well, without many problems being reported against the 4.9.y tree, despite it containing over 16,000 changes, making it the largest kernel to ever be released.

Future LTS kernels should be planned based on this release cycle (the last kernel of the year). This should allow SoC vendors to plan ahead on their development cycle to not release new chipsets based on older, and soon to be obsolete, LTS kernel versions.

Stable kernel patch rules

The rules for what can be added to a stable kernel release have remained almost identical for the past 12 years. The full list of the rules for patches to be accepted into a stable kernel release can be found in the Documentation/process/stable_kernel_rules.rst kernel file and are summarized here. A stable kernel change:

  • must be obviously correct and tested.
  • must not be bigger than 100 lines.
  • must fix only one thing.
  • must fix something that has been reported to be an issue.
  • can be a new device id or quirk for hardware, but not add major new functionality
  • must already be merged into Linus’s tree

The last rule, “a change must be in Linus’s tree”, prevents the kernel community from losing fixes. The community never wants a fix to go into a stable kernel release that is not already in Linus’s tree so that anyone who upgrades should never see a regression. This prevents many problems that other projects who maintain a stable and development branch can have.

Kernel Updates

The Linux kernel community has promised its userbase that no upgrade will ever break anything that is currently working in a previous release. That promise was made in 2007 at the annual Kernel developer summit in Cambridge, England, and still holds true today. Regressions do happen, but those are the highest priority bugs and are either quickly fixed, or the change that caused the regression is quickly reverted from the Linux kernel tree.

This promise holds true for both the incremental stable kernel updates, as well as the larger “major” updates that happen every three months.

The kernel community can only make this promise for the code that is merged into the Linux kernel tree. Any code that is merged into a device’s kernel that is not in the kernel.org releases is unknown and interactions with it can never be planned for, or even considered. Devices based on Linux that have large patchsets can have major issues when updating to newer kernels, because of the huge number of changes between each release. SoC patchsets are especially known to have issues with updating to newer kernels due to their large size and heavy modification of architecture specific, and sometimes core, kernel code.

Most SoC vendors do want to get their code merged upstream before their chips are released, but the reality of project-planning cycles and ultimately the business priorities of these companies prevent them from dedicating sufficient resources to the task. This, combined with the historical difficulty of pushing updates to embedded devices, results in almost all of them being stuck on a specific kernel release for the entire lifespan of the device.

Because of the large out-of-tree patchsets, most SoC vendors are starting to standardize on using the LTS releases for their devices. This allows devices to receive bug and security updates directly from the Linux kernel community, without having to rely on the SoC vendor’s backporting efforts, which traditionally are very slow to respond to problems.

It is encouraging to see that the Android project has standardized on the LTS kernels as a “minimum kernel version requirement”. Hopefully that will allow the SoC vendors to continue to update their device kernels in order to provide more secure devices for their users.

Security

When doing kernel releases, the Linux kernel community almost never declares specific changes as “security fixes”. This is due to the basic problem of the difficulty in determining if a bugfix is a security fix or not at the time of creation. Also, many bugfixes are only determined to be security related after much time has passed, so to keep users from getting a false sense of security by not taking patches, the kernel community strongly recommends always taking all bugfixes that are released.

Linus summarized the reasoning behind this behavior in an email to the Linux Kernel mailing list in 2008:

On Wed, 16 Jul 2008, pageexec@freemail.hu wrote:
>
> you should check out the last few -stable releases then and see how
> the announcement doesn't ever mention the word 'security' while fixing
> security bugs

Umm. What part of "they are just normal bugs" did you have issues with?

I expressly told you that security bugs should not be marked as such,
because bugs are bugs.

> in other words, it's all the more reason to have the commit say it's
> fixing a security issue.

No.

> > I'm just saying that why mark things, when the marking have no meaning?
> > People who believe in them are just _wrong_.
>
> what is wrong in particular?

You have two cases:

 - people think the marking is somehow trustworthy.

   People are WRONG, and are misled by the partial markings, thinking that
   unmarked bugfixes are "less important". They aren't.

 - People don't think it matters

   People are right, and the marking is pointless.

In either case it's just stupid to mark them. I don't want to do it,
because I don't want to perpetuate the myth of "security fixes" as a
separate thing from "plain regular bug fixes".

They're all fixes. They're all important. As are new features, for that
matter.

> when you know that you're about to commit a patch that fixes a security
> bug, why is it wrong to say so in the commit?

It's pointless and wrong because it makes people think that other bugs
aren't potential security fixes.

What was unclear about that?

    Linus

This email can be found here, and the whole thread is recommended reading for anyone who is curious about this topic.

When security problems are reported to the kernel community, they are fixed as soon as possible and pushed out publicly to the development tree and the stable releases. As described above, the changes are almost never described as a “security fix”, but rather look like any other bugfix for the kernel. This is done to allow affected parties the ability to update their systems before the reporter of the problem announces it.

Linus describes this method of development in the same email thread:

On Wed, 16 Jul 2008, pageexec@freemail.hu wrote:
>
> we went through this and you yourself said that security bugs are *not*
> treated as normal bugs because you do omit relevant information from such
> commits

Actually, we disagree on one fundamental thing. We disagree on
that single word: "relevant".

I do not think it's helpful _or_ relevant to explicitly point out how to
tigger a bug. It's very helpful and relevant when we're trying to chase
the bug down, but once it is fixed, it becomes irrelevant.

You think that explicitly pointing something out as a security issue is
really important, so you think it's always "relevant". And I take mostly
the opposite view. I think pointing it out is actually likely to be
counter-productive.

For example, the way I prefer to work is to have people send me and the
kernel list a patch for a fix, and then in the very next email send (in
private) an example exploit of the problem to the security mailing list
(and that one goes to the private security list just because we don't want
all the people at universities rushing in to test it). THAT is how things
should work.

Should I document the exploit in the commit message? Hell no. It's
private for a reason, even if it's real information. It was real
information for the developers to explain why a patch is needed, but once
explained, it shouldn't be spread around unnecessarily.

    Linus

Full details of how security bugs can be reported to the kernel community in order to get them resolved and fixed as soon as possible can be found in the kernel file Documentation/admin-guide/security-bugs.rst

Because security bugs are not announced to the public by the kernel team, CVE numbers for Linux kernel-related issues are usually released weeks, months, and sometimes years after the fix was merged into the stable and development branches, if at all.

Keeping a secure system

When deploying a device that uses Linux, it is strongly recommended that all LTS kernel updates be taken by the manufacturer and pushed out to their users after proper testing shows the update works well. As was described above, it is not wise to try to pick and choose various patches from the LTS releases because:

  • The releases have been reviewed by the kernel developers as a whole, not in individual parts
  • It is hard, if not impossible, to determine which patches fix “security” issues and which do not. Almost every LTS release contains at least one known security fix, and many yet “unknown”.
  • If testing shows a problem, the kernel developer community will react quickly to resolve the issue. If you wait months or years to do an update, the kernel developer community will not be able to even remember what the updates were given the long delay.
  • Changes to parts of the kernel that you do not build/run are fine and can cause no problems to your system. To try to filter out only the changes you run will cause a kernel tree that will be impossible to merge correctly with future upstream releases.

Note, this author has audited many SoC kernel trees that attempt to cherry-pick random patches from the upstream LTS releases. In every case, severe security fixes have been ignored and not applied.

As proof of this, I demoed at the Kernel Recipes talk referenced above how trivial it was to crash all of the latest flagship Android phones on the market with a tiny userspace program. The fix for this issue was released 6 months prior in the LTS kernel that the devices were based on, however none of the devices had upgraded or fixed their kernels for this problem. As of this writing (5 months later) only two devices have fixed their kernel and are now not vulnerable to that specific bug.

February 02, 2018
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
Linker script weird tricks or EFI on ia64 (February 02, 2018, 00:00 UTC)

trofi's blog: Linker script weird tricks or EFI on ia64

Linker script weird tricks or EFI on ia64

One of gentoo bugs was about gentoo install CD not being able to boot Itanium boxes.

The bug

The symptom is EFI loader error:

Loading.: Gentoo Linux
ImageAddress: pointer is outside of image
ImageAddress: pointer is outside of image
LoadPe: Section 1 was not loaded
Load of Gentoo Linux failed: Not Found
Paused - press any key to continue

which looks like a simple error if you have the hardware and source code of the loader that prints the error.

I had neither.

My plan was to extend ski emulator to be able to handle early EFI stuff. I started reading docs on early itanium boot life: on SALs (system abstraction layers), PALs (platform abstration layers), monarch processors, rendevouz protocols to handle MCAs (machine check aborts) and expected system state when hand off to EFI happens.

But SAL spec touches EFI only at SAL->EFI interface boundary. I dreaded to open EFI spec before finishing reading SAL paper.

How ia64 actually boots

But recently I got access to real management processor (sometimes also called MP, BMC (baseboard managenet controller), iLO (integrated lights-out)) on rx3600 machine and I gave up on extending ski for a while.

The very first thing I attempted is to get to serial console of operating system from iLO. It happened to be not as straightforward as passing console=ttyS0 to kernel. I ended up learning a bit about EFI shell, ia64-specific serial port numbering and elilo.

Eventually I was able to boot arbitrary kernel directly from EFI shell and get to kernel’s console! I needed that as a fallback in case I screw default boot loader, boot loader config or default boot options as I don’t have physical access to ia64 machine.

Here is an example of booting from CD and controlling it from iLO virtual serial console:

fs0:\efi\boot\bootia64.efi -i gentoo.igz gentoo initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot console=ttyS1,115200n8

here fs0 is a FAT file system (on CD, could be on HDD) with EFI applications. bootia64.efi is a sys-boot/elilo application in disguse. console=ttyS1,115200n8 matches EFI setup of second serial console.

After successful boot from CD of known good old kernel (from 2009) I was not afraid of messing with main HDD install or even upgrading to brand new kernels, I even managed to squash another ia64-specific kernel and GCC bug! I’ll try to write another post about that.

This poking also helped me to understand boot process in more detail. ia64 boots in the following way:

  1. SAL: when machine powers on it executes SAL (and PAL) code to initialize System and Processors. This code is not easily updateable and usually stays the same across OS updates.
  2. EFI: Then SAL hands control to EFI (also not easily updateable) where I can interactively pick boot device or even get EFI shell to run EFI applications (programs for EFI environment).
  3. Bootloader (EFI application): Normally EFI is set up to start boot loader after a few seconds. It’s up to the user (finally!) to provide a bootloader implementation. Gentoo ia64 handbook suggests sys-boot/elilo package.
  4. OS kernel: bootloader finds OS kernel, loads it and hands control off to kernel.

Searching for the clues

Our problem happened at stage 3 where EFI failed to load elilo application.

Gentoo builds .iso images automatically and continuously. Here you can find ia64 isos. Older disks are available on actual builder machines.

As elilo used to work on images from 2008 and even 2016 (!) I passed through a few autobuilt ISO images and collected a few working and non-working samples and started comparing them.

I extracted elilo.efi files from 3 disks:

  • elilo-2014-works.efi: good, picked from machine I have access to
  • elilo-2016-works.efi: good, picked from 2016 install CD
  • elilo-2018-does-not-work.efi: bad, freshly built on modern software

Normally I would start from running readelf -a on each executable and diff for suspicious changes. The files however are not ELFs:

$ file *.efi
elilo-2014-works.efi:         PE32+ executable (EFI application) Intel Itanium (stripped to external PDB), for MS Windows
elilo-2016-works.efi:         PE32 executable (EFI application) Intel Itanium (stripped to external PDB), for MS Windows
elilo-2018-does-not-work.efi: PE32+ executable (EFI application) Intel Itanium (stripped to external PDB), for MS Windows

One of them is not even PE32+ but still happens to boot.

Binutils has more generic readelf -a equivalent: it’s objdump -x. Comparing two good files:

$ objdump -x elilo-2014-works.efi > 2014.good
$ objdump -x elilo-2016-works.efi > 2016.good
--- 2014.good 2018-01-27 23:34:10.118197637 +0000
+++ 2016.good 2018-01-27 23:34:23.590191456 +0000
@@ -2,2 +2,2 @@
-elilo-2014-works.efi: file format pei-ia64
-elilo-2014-works.efi
+elilo-2016-works.efi: file format pei-ia64
+elilo-2016-works.efi
@@ -6 +6 @@
-start address 0x0000000000043a20
+start address 0x000000000003a6a0
@@ -14,2 +14,2 @@
-Time/Date Tue Jun 24 22:05:17 2014
-Magic 020b (PE32+)
+Time/Date Mon Jan 9 21:18:46 2006
+Magic 010b (PE32)
@@ -17,3 +17,3 @@
-MinorLinkerVersion 23
-SizeOfCode 00036e00
-SizeOfInitializedData 00020800
+MinorLinkerVersion 56
+SizeOfCode 0002e000
+SizeOfInitializedData 00028a00
@@ -21 +21 @@
-AddressOfEntryPoint 0000000000043a20
+AddressOfEntryPoint 000000000003a6a0
@@ -34,2 +34,2 @@
-SizeOfHeaders 000002c0
-CheckSum 00067705
+SizeOfHeaders 00000400
+CheckSum 00069054

There is a lot of odd going on here: the file on 2016 live CD is actually from 2006 and it’s actually older than file from 2014. It has different PE type and as a result different file alignment. Thus I discarded elilo-2016-works.efi as too old.

Comparing bad/good:

$ objdump -x elilo-2014-works.efi > 2014.good
$ objdump -x elilo-2018-does-not-work.efi > 2018.bad
$ diff -U0 2014.good 2018.bad
--- 2014.good 2018-01-27 23:42:58.355002114 +0000
+++ 2018.bad 2018-01-27 23:43:02.042000991 +0000
@@ -2,2 +2,2 @@
-elilo-2014-works.efi: file format pei-ia64
-elilo-2014-works.efi
+elilo-2018-does-not-work.efi: file format pei-ia64
+elilo-2018-does-not-work.efi
@@ -6 +6 @@
-start address 0x0000000000043a20
+start address 0x0000000000046d80
@@ -14 +14 @@
-Time/Date Tue Jun 24 22:05:17 2014
+Time/Date Thu Jan 1 01:00:00 1970
@@ -17,3 +17,3 @@
-MinorLinkerVersion 23
-SizeOfCode 00036e00
-SizeOfInitializedData 00020800
+MinorLinkerVersion 29
+SizeOfCode 0003a200
+SizeOfInitializedData 00020e00
@@ -21,2 +21,2 @@
-AddressOfEntryPoint 0000000000043a20
-BaseOfCode 0000000000001000
+AddressOfEntryPoint 0000000000046d80
+BaseOfCode 0000000000000000
@@ -33 +33 @@
-SizeOfImage 0005c000
+SizeOfImage 0005f000
@@ -35 +35 @@
-CheckSum 00067705
+CheckSum 0005f6a3
@@ -51 +51 @@
-Entry 5 0000000000058000 0000000c Base Relocation Directory [.reloc]
+Entry 5 000000000005b000 0000000c Base Relocation Directory [.reloc]
@@ -66,3 +66,3 @@
-Virtual Address: 00043a20 Chunk size 12 (0xc) Number of fixups 2
- reloc 0 offset 0 [43a20] DIR64
- reloc 1 offset 8 [43a28] DIR64
+Virtual Address: 00046d80 Chunk size 12 (0xc) Number of fixups 2
+ reloc 0 offset 0 [46d80] DIR64
+ reloc 1 offset 8 [46d88] DIR64
...
@@ -87,571 +87,585 @@
-[ 0](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x0000000000006a04 edd30_guid
-[ 1](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000000001f8 done_fixups
...
-[570](sec 2)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000000000110 Optind
+[ 0](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x0000000000006ccc edd30_guid
+[ 1](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x0000000000000208 done_fixups
...
+[584](sec 2)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000000000110 Optind

This looks better. A few notable things:

  • Time/Date field is not initialized for the bad file: sane date vs. all zeros
  • BaseOfCode looks uninitialized: 0x1000 vs. 0x0000
  • MinorLinkerVersion says good file was built with binutils-2.23, bad file was built with binutils-2.29
  • File has only 2 relocations: Number of fixups 2. We could check them manually!
  • And a lot of symbols: 570 vs. 580

I tried to build elilo with binutils-2.25: it produced the same binary as elilo-2018-does-not-work.efi.

My only clue was that BaseOfCode is zero. It felt like something used to reside in the first page before code section and now it does not anymore. What could it be?

GNU binutils linker scripts

Time to look at how decision is made what to put into the first page at link time!

The build process of elilo.efi is truly unusual. Let’s run emerge -1 sys-boot/elilo and check what commands are being executed to yield it:

# emerge -1 sys-boot/elilo
...
make -j1 ... ARCH=ia64
...
ia64-unknown-linux-gnu-gcc \
-I. -I. -I/usr/include/efi -I/usr/include/efi/ia64 -I/usr/include/efi/protocol -I./efi110 \
-O2 -fno-stack-protector -fno-strict-aliasing -fpic -fshort-wchar \
-Wall \
-DENABLE_MACHINE_SPECIFIC_NETCONFIG -DCONFIG_LOCALFS -DCONFIG_NETFS -DCONFIG_CHOOSER_SIMPLE \
-DCONFIG_CHOOSER_TEXTMENU \
-frename-registers -mfixed-range=f32-f127 \
-DCONFIG_ia64 \
-c glue_netfs.c -o glue_netfs.o
ia64-unknown-linux-gnu-ld \
-nostdlib -znocombreloc \
-T /usr/lib/elf_ia64_efi.lds \
-shared -Bsymbolic \
-L/usr/lib -L/usr/lib \
/usr/lib/crt0-efi-ia64.o elilo.o getopt.o strops.o loader.o fileops.o util.o vars.o alloc.o \
chooser.o config.o initrd.o alternate.o bootparams.o gunzip.o console.o fs/fs.o choosers/choosers.o \
devschemes/devschemes.o ia64/sysdeps.o glue_localfs.o glue_netfs.o \
\
-o elilo.so \
\
-lefi -lgnuefi \
/usr/lib/gcc/ia64-unknown-linux-gnu/7.2.0/libgcc.a
ia64-unknown-linux-gnu-ld: warning: creating a DT_TEXTREL in a shared object.
objcopy -j .text -j .sdata -j .data -j .dynamic -j .dynsym -j .rel \
-j .rela -j .reloc --target=efi-app-ia64 elilo.so elilo.efi
>>> Source compiled.

Here we see the following steps:

  • With exception of -frename-registers -mfixed-range=f32-f127 the process of building EFI application is almost like building normal shared library.
  • Non-standard /usr/lib/elf_ia64_efi.lds linker script is used.
  • objcopy –target=efi-app-ia64 is used to create PE32+ file out of ELF64. Very dark magic.

Luckily gnu-efi and elilo have superb documentation (10 pages). All the obscure corners are explained in every detail. Part 2: Inner Workings is the short description of how every dynamic linker works with a light touch of ELF -> PE32+ conversion. I wish I have seen this doc years ago :)

Let’s looks at the elf_ia64_efi.lds linker script to get full understanding of where every byte comes from when elilo.so is being linked (sourceforge viewer):

OUTPUT_FORMAT("elf64-ia64-little")
OUTPUT_ARCH(ia64)
ENTRY(_start_plabel)
SECTIONS
{
. = 0;
ImageBase = .;
.hash : { *(.hash) }/* this MUST come first! */
. = ALIGN(4096);
.text :
{
_text = .;
*(.text)
*(.text.*)
*(.gnu.linkonce.t.*)
. = ALIGN(16);
}
_etext = .;
_text_size = . - _text;
. = ALIGN(4096);
__gp = ALIGN (8) + 0x200000;
.sdata :
{
_data = .;
*(.got.plt)
*(.got)
*(.srodata)
*(.sdata)
*(.sbss)
*(.scommon)
}
. = ALIGN(4096);
.data :
{
*(.rodata*)
*(.ctors)
*(.data*)
*(.gnu.linkonce.d*)
*(.plabel)/* data whose relocs we want to ignore */
/* the EFI loader doesn't seem to like a .bss section, so we stick
it all into .data: */
*(.dynbss)
*(.bss)
*(COMMON)
}
.note.gnu.build-id : { *(.note.gnu.build-id) }
. = ALIGN(4096);
.dynamic : { *(.dynamic) }
. = ALIGN(4096);
.rela :
{
*(.rela.text)
*(.rela.data*)
*(.rela.sdata)
*(.rela.got)
*(.rela.gnu.linkonce.d*)
*(.rela.stab)
*(.rela.ctors)
}
_edata = .;
_data_size = . - _etext;
. = ALIGN(4096);
.reloc :/* This is the PECOFF .reloc section! */
{
*(.reloc)
}
. = ALIGN(4096);
.dynsym : { *(.dynsym) }
. = ALIGN(4096);
.dynstr : { *(.dynstr) }
/DISCARD/ :
{
*(.rela.plabel)
*(.rela.reloc)
*(.IA_64.unwind*)
*(.IA64.unwind*)
}
}

Even though the file is very big it’s easy to read. Linker script defines symbols (as symbol = expression, . (dot) means “current address”) and output section (as .output-section : { expressions }) in terms of input sections.

Here is what linker script tries to achieve:

  • . = 0; sets load address to 0. It does not really matter. At load time module will be relocated anyway.
  • ImageBase = . means that new symbol ImageBase is created and pointed at current address: the very start of the whole output as nothing was collected yet.
  • .hash : { *(.hash) } means to collect all DT_HASH input symbol sections into output DT_HASH section. DT_HASH defines mandatory section of fast symbol lookup. Linker uses that section to resolve symbol name to symbol type and offset in the file.
  • . = ALIGN(4096) forces linker to pad output (with zeros) to 4K boundary (EFI defines page size as 4K).
  • Only then goes .text : … section. BaseOfCode is the very first byte of .text section.
  • Other things go below.

GNU hash mystery

Later objcopy is used to produce final (PE32+) binary by copying whitelisted sections passed via -j:

objcopy -j .text -j .sdata -j .data -j .dynamic -j .dynsym -j .rel \
        -j .rela -j .reloc --target=efi-app-ia64 elilo.so elilo.efi

While objcopy does not copy .hash section into final binary it’s mere presence in elilo.so file changes .text offset as linker already allocated space for it in elilo.so and resolved other reloactions taking offset into account.

So why offset disappeared? Simple! Because gentoo does not generate .hash sections since 2014! .gnu.hash (DT_GNU_HASH) is being used instead. DT_GNU_HASH was added to binutils/glibc around 2006 as an optional mechanism to speed up dynamic linking and dynamic loading.

But linker script does not deal with .gnu.hash sections!

It’s easy to mimic handling of both section types:

--- a/gnuefi/elf_ia64_efi.lds
+++ b/gnuefi/elf_ia64_efi.lds
@@ -7,2 +7,3 @@ SECTIONS
ImageBase = .;
.hash : { *(.hash) } /* this MUST come first! */
+ .gnu.hash : { *(.gnu.hash) }

This fix alone was enough to restore elilo.efi! A few other architectures did not handle it either. See full upstream fix.

Breakage mechanics

But why does it matter? What does it mean to drop .hash section completely? PE format does not have a .hash equivalent.

Let’s inspect what actually changes in elilo.so file before and after the patch:

$ objdump -x elilo.efi.no.gnu.hash > elilo.efi.no.gnu.hash.od
$ objdump -x elilo.efi.gnu.hash > elilo.efi.gnu.hash.od
--- elilo.efi.no.gnu.hash.od 2018-01-29 23:05:25.776000000 +0000
+++ elilo.efi.gnu.hash.od 2018-01-29 23:05:31.700000000 +0000
@@ -2,2 +2,2 @@
-elilo.efi.no.gnu.hash: file format pei-ia64
-elilo.efi.no.gnu.hash
+elilo.efi.gnu.hash: file format pei-ia64
+elilo.efi.gnu.hash
@@ -6 +6 @@
-start address 0x0000000000046d80
+start address 0x0000000000047d80
@@ -21,2 +21,2 @@
-AddressOfEntryPoint 0000000000046d80
-BaseOfCode 0000000000000000
+AddressOfEntryPoint 0000000000047d80
+BaseOfCode 0000000000001000
@@ -33 +33 @@
-SizeOfImage 0005f000
+SizeOfImage 00060000
...
@@ -633,39 +633,38 @@
-[546](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000000000000 ImageBase
-[547](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000000007560 Udp4ServiceBindingProtocol
...
-[584](sec 2)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000000000110 Optind
+[546](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000000007560 Udp4ServiceBindingProtocol
...
+[583](sec 2)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000000000110 Optind
...

Here internal ImageBase symbol became external symbol! Which means EFI would have to resolve ImageBase at application startup. That’s why we have seen loader error (as opposed to elilo.efi runtime errors or kernel boot errors).

ELF spec says shared objects and dynamic executables are required to have .hash (or .gnu.hash) section to be valid executable but linker script did not maintain this requirement and all hell broke loose.

Perhaps linker could be tweaked to report warnings when symbol table is missing from output file. But as you can see linker scripts are much more powerful than just implementing fixed output format.

Fun details

gnu-efi has a lot of other jewels!

For example, entry point has to point to ia64 function descriptor (FDESCR). FDESCR is a pair of pointers: pointer to code section (actual entry point) and value of gp register (global pointer, base pointer used by PIC code).

These two pointers are both absolute addresses. But EFI application needs to be relocatable (loadable at different addresses).

Entry point FDESCR needs to be relocated by EFI loader.

How would you inject ia64 relocation to two 64-bit pointers in PE32+ format? gnu-efi does a very crazy thing (even more crazy than relying on objcopy to Just Work)): it injects PE32+ relocation directly into ia64 ELF code! That’s how it does the trick (the snippet below is the very tail of crt0-efi-ia64.S file):

// PE32+ wants a PLABEL, not the code address of the entry point:
.align 16
.global _start_plabel
.section .plabel, "a"
_start_plabel:
data8 _start
data8 __gp
// hand-craft a .reloc section for the plabel:
#define IMAGE_REL_BASED_DIR64 10
.section .reloc, "a"
data4 _start_plabel // Page RVA
data4 12 // Block Size (2*4+2*2)
data2 (IMAGE_REL_BASED_DIR64<<12) + 0 // reloc for plabel's entry point
data2 (IMAGE_REL_BASED_DIR64<<12) + 8 // reloc for plabel's global pointer

This code generates two sections:

  • .plabel with yet unrelocated _start and _gp pointers (crafts FDESCR itself)
  • .reloc with synthesised raw bytes that look like PE32+ relocation. It’s format is slightly more complicated and is defined by PE32+ spec:
    • [4 bytes] 32-bit offset to some blob where relocations are to be applied, in our case it’s _start_plabel
    • [4 bytes] full size of this relocation entry
    • [2 bytes * N relocations] list of tuples: (4 bits for relocation type, 12-bit offset to data to relocate)

Here we have two relocations that add ImageBase: to _start and to _gp. And it’s precisely these two relocations that EFI loader reported as invalid:

ImageAddress: pointer is outside of image
ImageAddress: pointer is outside of image
LoadPe: Section 1 was not loaded

Before .gnu.hash fix ImageBase (ImageAddress in EFI terminology) was indeed pointing somewhere else.

How about searching internets for source of this EFI loader error? tianocode has one hit in commented out code:

//...
Base = EfiLdrPeCoffImageAddress (Image, (UINTN)Section->VirtualAddress);
End = EfiLdrPeCoffImageAddress (Image, (UINTN)(Section->VirtualAddress + Section->Misc.VirtualSize));
if (EFI_ERROR(Status) || !Base || !End) {
// DEBUG((D_LOAD|D_ERROR, "LoadPe: Section %d was not loaded\n", Index));
PrintHeader ('L');
return EFI_LOAD_ERROR;
}
//...

Not very useful but still fun :)

Parting words

Itanium was the first system EFI was targeted at and was later morphed into UEFI. Surprisingly I managed to ignore (U)EFI-based boot on modern machines and this bug was my first experience to deal with it. And it was not too bad! :)

I found out a few things along the way:

  • ia64 EFI is a rich interface to OS and iLO to control the machine remotely
  • Linker scripts can do a lot more than I realized:
    • merge sections
    • discard sections
    • inject symbols
    • handle alignments
    • rename sections
  • With certain care objcopy can convert binaries across different object file formats. In this case ELF64 to PE32+
  • Assembler allows you to inject absolutely arbitrary sections into object files. Just make sure you handle those in your linker script.
  • Modern GNU toolchain still can breathe life into decade old hardware to teach us a trick or two.
  • objdump -x is a cool equivalent of readelf.

Have fun!

Posted on February 2, 2018
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

January 30, 2018
FOSDEM is near (January 30, 2018, 00:00 UTC)

Excitement is building with FOSDEM 2018 only a few days away. There are now 14 current and one former developer in planned attendance, along with many from the Gentoo community.

This year one Gentoo related talk titled Unix? Windows? Gentoo! will be given by Michael Haubenwallner (haubi ) of the Gentoo Prefix project.

Two more non-Gentoo related talks will be given by current Gentoo developers:

If you attend don’t miss out on your opportunity to build the Gentoo web-of-trust by bringing a valid governmental ID, a printed key fingerprint, and various UIDs you want certified. See you at FOSDEM!

January 24, 2018
Sven Vermeulen a.k.a. swift (homepage, bugs)
Documenting a rule (January 24, 2018, 19:40 UTC)

In the first post I talked about why configuration documentation is important. In the second post I looked into a good structure for configuration documentation of a technological service, and ended with an XCCDF template in which this documentation can be structured.

The next step is to document the rules themselves, i.e. the actual content of a configuration baseline.

Fine-grained rules

While from a high-level point of view, configuration items could be documented in a coarse-grained manner, a proper configuration baseline documents rules very fine-grained. Let's first consider a bad example:

All application code files are root-owned, with read-write privileges for owner and group, and executable where it makes sense.

While such a rule could be interpreted correctly, it also leaves room for misinterpretation and ambiguity. Furthermore, it is not explicit. What are application code files? Where are they stored? What about group ownership? The executable permission, when does that make sense? Does the rule also imply that there is no privilege for world-wide access, or does it just ignore that?

A better example (or set of examples) would be:

  • /opt/postgresql is recursively user-owned by root
  • /opt/postgresql is recursively group-owned by root
  • No files under /opt/postgresql are executable except when specified further
  • All files in /opt/postgresql/bin are executable
  • /opt/postgresql has system_u:object_r:usr_t:s0 as SELinux context
  • /opt/postgresql/bin/postgres has system_u:object_r:postgresql_exec_t:s0 as SELinux context

And even that list is still not complete, but you get the gist. The focus here is to have fine-grained rules which are explicit and not ambiguous.

Of course, the above configuration rule is still a "simple" permission set. Configuration baselines go further than that of course. They can act on file content ("no PAM configuration files can refer to pam_rootok.so except for runuser and su"), run-time processes ("The processes with /usr/sbin/sshd as command and with -D as option must run within the sshd_t SELinux domain"), database query results, etc.

This granularity is especially useful later on when you want to automate compliance checks, because the more fine-grained a description is, the easier it is to develop and maintain checks on it. But before we look into remediation, let's document the rule a bit further.

Metadata on the rules

Let's consider the following configuration rule:

/opt/postgresql/bin/postgres has system_u:object_r:postgresql_exec_t:s0 as SELinux context

In the configuration baseline, we don't just want to state that this is the rule, and be finished. We need to describe the rule in more detail, as was described in the previous post. More specifically, we definitely want to - know the rule's severity is, or how "bad" it would be if we detect a deviation from the rule - have an indication if the rule is security-sensitive or more oriented to manageability - a more elaborate description of the rule than just the title - an indication why this rule is in place (what does it solve, fix or simplify) - information on how to remediate if a deviation is found - know if the rule is applicable to our environment or not

The severity in the Security Content Automation Protocol (SCAP) standard, which defines the XCCDF standard as well as OVAL and a few others like CVSS, uses the following possible values for severity: unknown, info, low, medium, high.

To indicate if a rule is security-oriented or not, XCCDF's role attribute is best used. With the role attribute, you state if a rule is to be included in the final scoring (a weighted value given to the compliance of a system) or not. If it is, then it is security sensitive.

The indication of a rule applicability in the environment might seem strange. If you document the configuration baseline, shouldn't it include only those settings you want? Well, yes and no. Personally, I like to include recommendations that we do not follow in the baseline as well.

Suppose for instance that an audit comes along and says you need to enable data encryption on the database. Let's put aside that an auditor should focus mainly/solely on the risks, and let the solutions be managed by the team (but be involved in accepting solutions of course), the team might do an assessment and find that data encryption on the database level (i.e. the database files are encrypted so non-DBA users with operating system interactive rights cannot read the data) is actually not going to remediate any risk, yet introduce more complexity.

In that situation, and assuming that the auditor agrees with a different control, you might want to add a rule to the configuration baseline about this. Either you document the wanted state (database files do not need to be encrypted), or you document the suggestion (database files should be encrypted) but explicitly state that you do not require or implement it, and document the reasoning for it. The rule is then augmented with references to the audit recommendation for historical reasons and to facilitate future discussions.

And yes, I know the rule "database files should be encrypted" is still ambiguous. The actual rule should be more specific to the technology).

Documenting a rule in XCCDF

In XCCDF, a rule is defined through the Rule XML entity, and is placed within a Group. The Group entities are used to structure the document, while the Rule entities document specific configuration directives.

The postgres related rule of above could be written as follows:

<Rule id="xccdf_com.example_rule_pgsql-selinux-context"
      role="full"
      selected="1"
      weight="5.1"
      severity="high"
      cluster-id="network">
  <title>
    /opt/postgresql/bin/postgres has system_u:object_r:postgresql_exec_t:s0 as SELinux context
  </title>
  <description>
    <xhtml:p>
      The postgres binary is the main binary of the PostgreSQL database daemon. Once started, it launches the necessary workers. To ensure that PostgreSQL runs in the proper SELinux domain (postgresql_t) its binary must be labeled with postgresql_exec_t.
    </xhtml:p>
    <xhtml:p>
      The current state of the label can be obtained using stat, or even more simple, the -Z option to ls:
    </xhtml:p>
    <xhtml:pre>~$ ls -Z /opt/postgresql/bin/postgres
-rwxr-xr-x. root root system_u:object_r:postgresql_exec_t:s0 /opt/postgresql/bin/postgres
    </xhtml:pre>
  </description>
  <rationale>
    <xhtml:p>
      The domain in which a process runs defines the SELinux controls that are active on the process. Services such as PostgreSQL have an established policy set that controls what a database service can and cannot do on the system.
    </xhtml:p>
    <xhtml:p>
      If the PostgreSQL daemon does not run in the postgresql_t domain, then SELinux might either block regular activities of the database (service availability impact), block behavior that impacts its effectiveness (integrity issue) or allow behavior that shouldn't be allowed. The latter can have significant consequences once a vulnerability is exploited.
    </xhtml:p>
  </rationale>
  <fixtext>
    Restore the context of the file using restorecon or chcon.
  </fixtext>
  <fix strategy="restrict" system="urn:xccdf:fix:script:sh">restorecon /opt/postgresql/bin/postgres
  </fix>
  <ident system="http://example.com/configbaseline">pgsql-01032</ident>
</Rule>

Although this is lots of XML, it is easy to see what each element declares. The NIST IR 7275 document is a very good resource to continuously consult in order to find the right elements and their interpretation.

There is one element added that is "specific" to the content of this blog post series and not the XCCDF standard, namely the identification. As mentioned in an earlier post, organizations might have their own taxonomy for technical service identification, and requirements on how to number or identify rules. In the above example, the rule is identified as pgsql-01032.

There is another attribute in use above that might need more clarification: the weight of the rule.

Abusing CVSS for configuration weight scoring

In the above example, a weight is given to the rule scoring (weight of 5.1). This number is obtained through a CVSS calculator, which is generally used to identify the risk of a security issue or vulnerability. CVSS stands for Common Vulnerability Scoring System and is a popular way to weight security risks (which are then associated with vulnerability reports, Common Vulnerabilities and Exposures (CVE)).

Misconfigurations can also be slightly interpreted as a security risk, although it requires some mental bridges. Rather than scoring the rule, you score the risk that it mitigates, and consider the worst thing that could happen if that rule is not implemented correctly. Now, worst-case thinking is subjective, so there will always be discussion on the weight of a rule. It is therefore important to have a consensus in the team (if the configuration baseline is team-owned) if this weight is actively used. Of course, an organization might choose to ignore the weight, or use a different scoring mechanism.

In the above situation, I scored what would happen if a vulnerability in PostgreSQL was successfully exploited, and SELinux couldn't mitigate the risk as the label of the file was wrong. The result of a wrong label could be that the PostgreSQL service runs in a higher privileged domain, or even in an unconfined domain (no SELinux restrictions active), so there is a heightened risk of confidentiality loss (beyond the database) and even integrity risk.

However, the confidentiality risk is scored as low, and integrity even in between (base risk is low, but due to other constraints put in place integrity impact is reduced further) because PostgreSQL runs as a non-administrative user on the system, and perhaps because the organization uses dedicated systems for database hosting (so other services are not easily impacted).

As mentioned, this is somewhat abusing the CVSS methodology, but is imo much more effective than trying to figure out your own scoring methodology. With CVSS, you start with scoring the risk regardless of context (CVSS Base), then adjust based on recent state or knowledge (CVSS Temporal), and finally adjust further with knowledge of the other settings or mitigating controls in place (CVSS Environmental).

Personally, I prefer to only use the CVSS Base scoring for configuration baselines, because the other two are highly depending on time (which is, for documentation, challenging) and the other controls (which is more of a concern for service technical documentation). So in my preferred situation, the rule would be scored as 5.4 rather than 5.1. But that's just me.

Isn't this CCE?

People who use SCAP a bit more might already be thinking if I'm not reinventing the wheel here. After all, SCAP also has a standard called Common Configuration Enumeration (CCE) which seems to be exactly what I'm doing here: enumerating the configuration of a technical service. And indeed, if you look at the CCE list you'll find a number of Excel sheets (sigh) that define common configurations.

For instance, for Red Hat Enterprise Linux v5, there is an enumeration identified as CCE-4361-2, which states:

File permissions for /etc/pki/tls/ldap should be set correctly

The CCE description then goes on stating that this is a permission setting (CCE Parameter), which can be rectified with chmod (CCE Technical Mechanism), and refers to a source for the setting.

However, CCE has a number of downsides.

First of all, it isn't being maintained anymore. And although XCCDF itself is also a quite old standard, it is still being looked into (a draft new version is being prepared) and is actively used as a standard. Red Hat is investing time and resources into secure configurations and compliancy aligned with SCAP, and other vendors publish SCAP-specific resources as well. CCE however would be a list, and thus requires continuous management. That RHELv5 is the most recent RHEL CCE list is a bad thing.

Second, CCE's structure is for me insufficient to use in configuration baselines. XCCDF has a much more mature and elaborate set of settings for this. What CCE does is actually what I use in the above example as the organization-specific identifier.

Finally, there aren't many tools that actively use CCE, unlike CVSS, XCCDF, OVAL, CVSS and other standards under the SCAP umbrella, which are all still actively used and developed upon by tools such as Open-SCAP.

Profiling

Before finishing this post, I want to talk about profiling.

Within an XCCDF benchmark, several profiles can be defined. In the XCCDF template I defined a single profile that covers all rules, but this can be fine-tuned to the needs of the organization. In XCCDF profiles, you can select individual rules (which ones are active for a profile and which ones aren't) and even fine-tune values for rules. This is called tailoring in XCCDF.

A first use case for profiles is to group different rules based on the selected setup. In case of Nginx for instance, one can consider Nginx being used as either a reverse proxy, a static website hosting or a dynamic web application hosting. In all three cases, some rules will be the same, but several rules will be different. Within XCCDF, you can document all rules, and then use profiles to group the rules related to a particular service use.

XCCDF allows for profile inheritance. This means that you can define a base Profile (all the rules that need to be applied, regardless of the service use) and then extend the profiles with individual rule selections.

With profiles, you can also fine-tune values. For instance, you could have a password policy in place that states that passwords on internal machines have to be at least 10 characters long, but on DMZ systems they need to be at least 15 characters long. Instead of defining two rules, the rule could refer to a particular variable (Value in XCCDF) which is then selected based on the Profile. The value for a password length is then by default 10, but the Profile for DMZ systems selects the other value (15).

Now, value-based tailoring is imo already a more advanced use of XCCDF, and is best looked into when you also start using OVAL or other automated checks. The tailoring information is then passed on to the automated compliance check so that the right value is validated.

Value-based tailoring also makes rules either more complex to write, or ambiguous to interpret without full profile awareness. Considering the password length requirement, the rule could become:

The /etc/pam.d/password-auth file must refer to pam_passwdqc.so for the password service with a minimal password length of 10 (default) or 15 (DMZ) for the N4 password category

At least the rule is specific. Another approach would be to document it as follows:

The /etc/pam.d/password-auth file must refer to pam_passwdqc.so with the proper organizational password controls

The documentation of the rule might document the proper controls further, but the rule is much less specific. Later checks might report that a system fails this check, referring to the title, which is insufficient for engineers or administrators to resolve.

Generating the guide

To close off this post, let's finish with how to generate the guide based on an XCCDF document. Personally, I use two approaches for this.

The first one is to rely on Open-SCAP. With Open-SCAP, you can generate guides easily:

~$ oscap xccdf generate guide xccdf.xml > ConfigBaseline.html

The second one, which I use more often, is a custom XSL style sheet, which also introduces the knowledge and interpretations of what this blog post series brings up (including the organizational identification). The end result is similar (the same content) but uses a structure/organization that is more in line with expectations.

For instance, in my company, the information security officers want to have a tabular overview of all the rules in a configuration baseline. So the XSL style sheet generates such a tabular overview, and uses in-documenting linking to the more elaborate descriptions of all the rules.

An older version is online for those interested. It uses JavaScript as well (in case you are security sensitive you might want to look into it) to allow collapsing rule documentation for faster online viewing.

The custom XSL has an additional advantage, namely that there is no dependency on Open-SCAP to generate the guides (even though it is perfectly possible to copy the XSL and continue). I can successfully generate the guide using Microsoft's msxml utility, using xsltproc, etc depending on the platform I'm on.

Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
Designing My Password Manager (January 24, 2018, 04:04 UTC)

So while at the (in my opinion excellent) Enigma, Tavis decided to nerd snipe me into thinking of how would I design a password manager. This because he knows (and disagree with) my present choice of using LastPass as my password manager of choice, despite the existence of a range of opensource password managers.

The main reason why I have discarded the choice of the available password managers is that, in my opinion, they all miss the main point of what a password manager should be: convenient. Password reuse is a problem because it’s more convenient than using dozens, maybe hundreds of different passwords for all the different services. An easy to use password manager is even more convenient than reusing password.

The first problem I found with effectively all of the opensource password managers I know of just use a single file-blob, and leave to you the problem of syncing it. The best I have seen was one software providing integration with Dropbox, Google Drive and ownCloud to sync the blob around, just as long as you don’t make conflicting changes on different devices. To me, the ability to make independent changes to services is actually a big requirement. This means that I would be using a file format that allows encrypting “per row”, both the keys and the values (because you don’t want to leak which accounts the user registered on, if the file is leaked). I would probably gravitate around something like the Bigtable format.

Another problem, which is present in Chrome SmartLock too, is the lack of support for what LastPass call “equivalent domains”. Due to many silly reasons, particularly if you travel a lot or if you live in different countries, you end up collecting a long list of websites using either separate domain names, or at least different TLDs. An example of this is Amazon, that use a constellation of different domains, but all share the same account management (except for Amazon Japan). A sillier example for this are Yelp and TripAdvisor, that decide to change your TLDs depending on the IP address you’re coming from, despite being the kind of services you would use particularly outside your usual country.

Admittedly, as Tavis suggested, these should be solved by the services themselves, using a single host for login/identity management. I do not expect this to happen any time soon. My previous proposal for this was defining equivalence as part as a well known configuration file (together with other improvements to password management). I now have some personal introspection questions about this, because I wonder if there is a privacy risk in sending requests to other domain to validate the reciprocal equivalence configurations. So I think I’ll leave this up for debate for later.

The next design bit is figuring out how should the password generator behave. We already have a number of good password generators of different types, including software implementations of diceware passwords (despite the site repeatedly telling you not to use computer random generators — to add to the inconvenience), and xkcdpass, that generate passwords that are easier to remember or at least to type. I think that a good password manager should allow for more than just the random-bunch-of-characters passwords that LastPass uses.

In particular, for me, I have a few websites for which I use passwords generated by xkcdpass, because I need to actually type in the password, rather than use the password manager autofill capabilities. This is the case of Sony and Nintendo accounts, that need to be typed from consoles, and of Telegram, as I need to type the password on my watch to receive messages there. Unfortunately implementing this is probably going to be an UX nightmare — one of the reason being the ability to select different wordlists. Non-English speakers are likely interested in using their own language for it. Or even the English speakers that are not afraid of other languages, and may decide to throw off a possible attacker anyway.

Ideally, the password generation settings would be stored on a domain-by-domain basis, so that if a certain website only allows numbers in its passcode, or it has a specific character limit, the same setting is used to generated a password if it’s ever breached. This may sound minor, but to me it would be so much more of a time (and frustration) saver, that it would easily become a killer feature.

But all of these idea fall to nothing without good, convenient, and trustworthy client implementations. Right now one of the many weak spots of LastPass is its Chrome extension (and Firefox too). A convenient password manager, though, ought to be able to integrate with the browser and, since it’s 2018 after all, with your phone. Unfortunately, here is where any opensource offering can’t really help as much as we would all like: it still relies (hugely) on trust. As far as I can tell, there is no way to make absolutely certain that the code of a Chrome extension on the Web Store, or of an Android app on either Play Store or F-Droid, corresponds exactly with a certain source distribution.

Don’t get me wrong, this is a real problem right now, with closed source extensions too. You need to trust the extension is not injecting malicious content in your webpage, or exfiltrating data out of your browser session. Earlier this year a widely used Chrome extension was reported as malicious, but it wasn’t until that was identified that it was removed from Chrome’s repository. At least I can have a (maybe underserved) trust in LogMeIn not to intentionally ruin their reputation by pushing actively malicious code to the Store. Would I say the same for a random single developer maintaining their widely used extension?

What this means to me is that building a good replacement for LastPass is not just a technical problem that needs to be solved by actively syncing with cloud storage services… it’s a problem of social trust, and that requires quite a bit of thought from many actors of the market: browser vendors, device manufacturers, and developers, which I’m not sure is currently happening. So I don’t hold my breath, and keep at compromises. I made mine with my threats in mind, you should make yours with what concerns you the most.

January 23, 2018
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
Evaluating ScyllaDB for production 1/2 (January 23, 2018, 17:31 UTC)

I have recently been conducting a quite deep evaluation of ScyllaDB to find out if we could benefit from this database in some of our intensive and latency critical data streams and jobs.

I’ll try to share this great experience within two posts:

  1. The first one (you’re reading) will walk through how to prepare yourself for a successful Proof Of Concept based evaluation with the help of the ScyllaDB team.
  2. The second post will cover the technical aspects and details of the POC I’ve conducted with the various approaches I’ve followed to find the most optimal solution.

But let’s start with how I got into this in the first place…


Selecting ScyllaDB

I got interested in ScyllaDB because of its philosophy and engagement and I quickly got into it by being a modest contributor and its Gentoo Linux packager (not in portage yet).

Of course, I didn’t pick an interest in that technology by chance:

We’ve been using MongoDB in (mass) production at work for a very very long time now. I can easily say we were early MongoDB adopters. But there’s no wisdom in saying that MongoDB is not suited for every use case and the Hadoop stack has come very strong in our data centers since then, with a predominance of Hive for the heavy duty and data hungry workflows.

One thing I was never satisfied with MongoDB was its primary/secondary architecture which makes you lose write throughput and is even more horrible when you want to set up what they call a “cluster” which is in fact some mediocre abstraction they add on top of replica-sets. To say the least, it is inefficient and cumbersome to operate and maintain.

So I obviously had Cassandra on my radar for a long time, but I was pushed back by its Java stack, heap size and silly tuning… Also, coming from the versatile MongoDB world, Cassandra’s CQL limitations looked dreadful at that time…

The day I found myself on ScyllaDB’s webpage and read their promises, I was sure to be challenging our current use cases with this interesting sea monster.


Setting up a POC with the people at ScyllaDB

Through my contributions around my packaging of ScyllaDB for Gentoo Linux, I got to know a bit about the people behind the technology. They got interested in why I was packaging this in the first place and when I explained my not-so-secret goal of challenging our production data workflows using Scylla, they told me that they would love to help!

I was a bit surprised at first because this was the first time I ever saw a real engagement of the people behind a technology into someone else’s POC.

Their pitch is simple, they will help (for free) anyone conducting a serious POC to make sure that the outcome and the comprehension behind it is the best possible. It is a very mature reasoning to me because it is easy to make false assumptions and conclude badly when testing a technology you don’t know, even more when your use cases are complex and your expectations are very high like us.

Still, to my current knowledge, they’re the only ones in the data industry to have this kind of logic in place since the start. So I wanted to take this chance to thank them again for this!

The POC includes:

  • no bullshit, simple tech-to-tech relationship
  • a private slack channel with multiple ScyllaDB’s engineers
  • video calls to introduce ourselves and discuss our progress later on
  • help in schema design and logic
  • fast answers to every question you have
  • detailed explanations on the internals of the technology
  • hardware sizing help and validation
  • funny comments and French jokes (ok, not suitable for everyone)

 

 

 

 

 

 

 

 

 


Lessons for a successful POC

As I said before, you’ve got to be serious in your approach to make sure your POC will be efficient and will lead to an unbiased and fair conclusion.

This is a list of the main things I consider important to have prepared before you start.

Have some background

Make sure to read some literature to have the key concepts and words in mind before you go. It is even more important if like me you do not come from the Cassandra world.

I found that the Cassandra: The Definitive Guide book at O’Reilly is a great read. Also, make sure to go around ScyllaDB’s documentation.

Work with a shared reference document

Make sure you share with the ScyllaDB guys a clear and detailed document explaining exactly what you’re trying to achieve and how you are doing it today (if you plan on migrating like we did).

I made a google document for this because it felt the easiest. This document will be updated as you go and will serve as a reference for everyone participating in the POC.

This shared reference document is very important, so if you don’t know how to construct it or what to put in it, here is how I structured it:

  1. Who’s participating at <your company>
    • photo + name + speciality
  2. Who’s participating at ScyllaDB
  3. POC hardware
    • if you have your own bare metal machines you want to run your POC on, give every detail about their number and specs
    • if not, explain how you plan to setup and run your scylla cluster
  4. Reference infrastructure
    • give every details on the technologies and on the hardware of the servers that are currently responsible for running your workflows
    • explain your clusters and their speciality
  5. Use case #1 : <name>
    • Context
      • give context about your use case by explaining it without tech words, think from the business / user point of view
    • Current implementations
      • that’s where you get technical
      • technology names and where they come into play in your current stack
      • insightful data volumes and cardinality
      • current schema models
    • Workload related to this use case
      • queries per second per data source / type
      • peek hours or no peek hours?
      • criticality
    • Questions we want to answer to
      • remember, the NoSQL world is lead by query-based-modeling schema design logic, cassandra/scylla is no exception
      • write down the real questions you want your data model(s) to be able to answer to
      • group them and rate them by importance
    • Validated models
      • this one comes during the POC when you have settled on the data models
      • write them down, explain them or relate them to the questions they answer to
      • copy/paste some code showcasing how to work with them
    • Code examples
      • depending on the complexity of your use case, you may have multiple constraints or ways to compare your current implementation with your POC
      • try to explain what you test and copy/paste the best code you came up with to validate each point

Have monitoring in place

ScyllaDB provides a monitoring platform based on Docker, Prometheus and Grafana that is efficient and simple to set up. I strongly recommend that you set it up, as it provides valuable insights almost immediately, and on an ongoing basis.

Also you should strive to give access to your monitoring to the ScyllaDB guys, if that’s possible for you. They will provide with a fixed IP which you can authorize to access your grafana dashboards so they can have a look at the performances of your POC cluster as you go. You’ll learn a great deal about ScyllaDB’s internals by sharing with them.

Know when to stop

The main trap in a POC is to work without boundaries. Since you’re looking for the best of what you can get out of a technology, you’ll get tempted to refine indefinitely.

So this is good to have at least an idea on the minimal figures you’d like to reach to get satisfied with your tests. You can always push a bit further but not for too long!

Plan some high availability tests

Even if you first came to ScyllaDB for its speed, make sure to test its high availability capabilities based on your experience.

Most importantly, make sure you test it within your code base and guidelines. How will your code react and handle a failure, partial and total? I was very surprised and saddened to discover so little literature on the subject in the Cassandra community.

POC != production

Remember that even when everything is right on paper, production load will have its share of surprises and unexpected behaviours. So keep a good deal of flexibility in your design and your capacity planning to absorb them.

Make time

Our POC lasted almost 5 months instead of estimated 3, mostly because of my agenda’s unwillingness to cooperate…

As you can imagine this interruption was not always optimal, for either me or the ScyllaDB guys, but they were kind not to complain about it. So depending on how thorough you plan to be, make sure you make time matching your degree of demands. The reference document is also helpful to get back to speed.


Feedback for the ScyllaDB guys

Here are the main points I noted during the POC that the guys from ScyllaDB could improve on.

They are subjective of course but it’s important to give feedback so here it goes. I’m fully aware that everyone is trying to improve, so I’m not pointing any fingers at all.

I shared those comments already with them and they acknowledged them very well.

More video meetings on start

When starting the POC, try to have some pre-scheduled video meetings to set it right in motion. This will provide a good pace as well as making sure that everyone is on the same page.

Make a POC kick starter questionnaire

Having a minimal plan to follow with some key points to set up just like the ones I explained before would help. Maybe also a minimal questionnaire to make sure that the key aspects and figures have been given some thought since the start. This will raise awareness on the real answers the POC aims to answer.

To put it simpler: some minimal formalism helps to check out the key aspects and questions.

Develop a higher client driver expertise

This one was the most painful to me, and is likely to be painful for anyone who, like me, is not coming from the Cassandra world.

Finding good and strong code examples and guidelines on the client side was hard and that’s where I felt the most alone. This was not pleasant because a technology is definitely validated through its usage which means on the client side.

Most of my tests were using python and the python-cassandra driver so I had tons of questions about it with no sticking answers. Same thing went with the spark-cassandra-connector when using scala where some key configuration options (not documented) can change the shape of your results drastically (more details on the next post).

High Availability guidelines and examples

This one still strikes me as the most awkward on the Cassandra community. I literally struggled with finding clear and detailed explanations about how to handle failure more or less gracefully with the python driver (or any other driver).

This is kind of a disappointment to me for a technology that position itself as highly available… I’ll get into more details about it on the next post.

A clearer sizing documentation

Even if there will never be a magic formula, there are some rules of thumb that exist for sizing your hardware for ScyllaDB. They should be written down more clearly in a maybe dedicated documentation (sizing guide is labeled as admin guide at time of writing).

Some examples:

  • RAM per core ? what is a core ? relation to shard ?
  • Disk / RAM maximal ratio ?
  • Multiple SSDs vs one NMVe ?
  • Hardware RAID vs software RAID ? need a RAID controller at all ?

Maybe even provide a bare metal complete example from two different vendors such as DELL and HP.

What’s next?

In the next post, I’ll get into more details on the POC itself and the technical learnings we found along the way. This will lead to the final conclusion and the next move we engaged ourselves with.

Marek Szuba a.k.a. marecki (homepage, bugs)
Randomness in virtual machines (January 23, 2018, 15:47 UTC)

I always felt that entropy available to the operating system must be affected by running said operating system in a virtual environment – after all, unpredictable phenomena used to feed the entropy pool are commonly based on hardware and in a VM most hardware either is simulated or has the hypervisor mediate access to it. While looking for something tangentially related to the subject, I have recently stumbled upon a paper commissioned by the German Federal Office for Information Security which covers this subject, with particular emphasis on entropy sources used by the standard Linux random-number generator (i.e. what feeds /dev/random and /dev/urandom), in extreme detail:

https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/Studien/ZufallinVMS/Randomness-in-VMs.pdf?__blob=publicationFile&v=3

Personally I have found this paper very interesting but since not everyone can stomach a 142-page technical document, here are some of the main points made by its authors regarding entropy.

  1. As a reminder – the RNG in recent versions of Linux uses five sources of random noise, three of which do not require dedicated hardware (emulated or otherwise): Human-Interface Devices, rotational block devices, and interrupts.
  2. Running in a virtual machine does not affect entropy sources implemented purely in hardware or purely in software. However, it does affect hybrid sources – which in case of Linux essentially means all of the default ones.
  3. Surprisingly enough, virtualisation seems to have no negative effect on the quality of delivered entropy. This is at least in part due to additional CPU execution-time jitter introduced by the hypervisor compensating for increased predictability of certain emulated devices.
  4. On the other hand, virtualisation can strongly affect the quantity of produced entropy. More on this below.
  5. Low quantity of available entropy means among other things that it takes virtual machines visibly longer to bring /dev/urandom to usable state. This is a problem if /dev/urandom is used by services started at boot time because they can be initialised using low-quality random data.

Why exactly is the quantity of entropy low in virtual machines? The problem is that in a lot of configurations, only the last of the three standard noise sources will be active. On the one hand, even physical servers tend to be fairly inactive on the HID front. On the other, the block-device source does nothing unless directly backed by a rotational device – which has been becoming less and less likely, especially when we talk about large cloud providers who, chances are, hold your persistent storage on distributed networked file systems which are miles away from actual hard drives. This leaves interrupts as the only available noise source. Now take a machine configured this way and have it run a VPN endpoint, a HTTPS server, a DNSSEC-enabled DNS server… You get the idea.

But wait, it gets worse. Chances are many devices in your VM, especially ones like network cards which are expected to be particularly active and therefore the main source of interrupts, are in fact paravirtualised rather than fully emulated. Will such devices still generate enough interrupts? That depends on the underlying hypervisor, or to be precise on the paravirtualisation framework it uses. The BSI paper discusses this for KVM/QEMU, Oracle VirtualBox, Microsoft Hyper-V, and VMWare ESXi:

  • the former two use the the VirtIO framework, which is integrated with the Linux interrupt-handling code;
  • VMWare drivers trigger the interrupt handler similarly to physical devices;
  • Hyper-V drivers use a dedicated communication channel called VMBus which does not invoke the LRNG interrupt handler. This means it is entirely feasible for a Linux Hyper-V guest to have all noise sources disabled, and reenabling the interrupt one comes at a cost of performance loss caused by the use of emulated rather than paravirtualised devices.

 

All in all, the paper in question has surprised me with the news of unchanged quality of entropy in VMs and confirmed my suspicions regarding its quantity. It also briefly mentions (which is how I have ended up finding it) the way I typically work around this problem, i.e. by employing VirtIO-RNG – a paravirtualised hardware RNG in the KVM/VirtualBox guests which which interfaces with a source of entropy (typically /dev/random, although other options are possible too) on the host. Combine that with haveged on the host, sprinkle a bit of rate limiting on top (I typically set it to 1 kB/s per guest, even though in theory haveged is capable of acquiring entropy at a rate several orders of magnitude higher) and chances are you will never have to worry about lack of entropy in your virtual machines again.

January 19, 2018
Greg KH a.k.a. gregkh (homepage, bugs)

I keep getting a lot of private emails about my previous post about the latest status of the Linux kernel patches to resolve both the Meltdown and Spectre issues.

These questions all seem to break down into two different categories, “What is the state of the Spectre kernel patches?”, and “Is my machine vunlerable?”

State of the kernel patches

As always, lwn.net covers the technical details about the latest state of the kernel patches to resolve the Spectre issues, so please go read that to find out that type of information.

And yes, it is behind a paywall for a few more weeks. You should be buying a subscription to get this type of thing!

Is my machine vunlerable?

For this question, it’s now a very simple answer, you can check it yourself.

Just run the following command at a terminal window to determine what the state of your machine is:

1
$ grep . /sys/devices/system/cpu/vulnerabilities/*

On my laptop, right now, this shows:

1
2
3
4
$ grep . /sys/devices/system/cpu/vulnerabilities/*
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Vulnerable
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Vulnerable: Minimal generic ASM retpoline

This shows that my kernel is properly mitigating the Meltdown problem by implementing PTI (Page Table Isolation), and that my system is still vulnerable to the Spectre variant 1, but is trying really hard to resolve the variant 2, but is not quite there (because I did not build my kernel with a compiler to properly support the retpoline feature).

If your kernel does not have that sysfs directory or files, then obviously there is a problem and you need to upgrade your kernel!

Some “enterprise” distributions did not backport the changes for this reporting, so if you are running one of those types of kernels, go bug the vendor to fix that, you really want a unified way of knowing the state of your system.

Note that right now, these files are only valid for the x86-64 based kernels, all other processor types will show something like “Not affected”. As everyone knows, that is not true for the Spectre issues, except for very old CPUs, so it’s a huge hint that your kernel really is not up to date yet. Give it a few months for all other processor types to catch up and implement the correct kernel hooks to properly report this.

And yes, I need to go find a working microcode update to fix my laptop’s CPU so that it is not vulnerable against Spectre…

January 18, 2018
Matthew Thode a.k.a. prometheanfire (homepage, bugs)
Undervolting your CPU for fun and profit (January 18, 2018, 06:00 UTC)

Disclaimer

I'm not responsible if you ruin your system, this guide functions as documentation for future me. While this should just cause MCEs or system resets if done wrong it might also be able to ruin things in a more permanent way.

Why do this

It lowers temps generally and if you hit thermal throttling (as happens with my 5th Generation X1 Carbon) you may thermally throttle less often or at a higher frequency.

Prework

Repaste first, it's generally easier to do and can result in better gains (and it's all about them gains). For instance, my Skull Canyon NUC was resetting itself thermally when stress testing. Repasting lowered the max temps by 20°C.

Undervolting - The How

I based my info on https://github.com/mihic/linux-intel-undervolt which seems to work on my Intel Kaby Lake based laptop.

Using the MSR registers to write the values via msr-tools wrmsr binary.

The following python3 snippet is how I got the values to write.

# for a -110mv offset I run the following
format(0xFFE00000&( (round(-110*1.024)&0xFFF) <<21), '08x')

What you are actually writing is actually as follows, with the plane index being for cpu, gpu or cache for 0, 1 or 2 respectively.

constant plane index constant write/read offset
80000 0 1 1 F1E00000

So we end up with 0x80000011F1E00000 to write to MSR register 0x150.

Undervolting - The Integration

I made a script in /opt/bin/, where I place my custom system scripts called undervolt.sh (make sure it's executable).

#!/usr/bin/env sh
/usr/sbin/wrmsr 0x150 0x80000011F1E00000  # cpu core  -110
/usr/sbin/wrmsr 0x150 0x80000211F1E00000  # cpu cache -110
/usr/sbin/wrmsr 0x150 0x80000111F4800000  # gpu core  -90

I then made a custom systemd unit in /etc/systemd/system called undervolt.service with the following content.

[Unit]
Description=Undervolt Service

[Service]
ExecStart=/opt/bin/undervolt.sh

[Install]
WantedBy=multi-user.target

I then systemctl enable undervolt.service so I'll get my settings on boot.

The following script was also placed in /usr/lib/systemd/system-sleep/undervolt.sh to get the settings after recovering from sleep, as they are lost at that point.

#!/bin/sh
if [ "${1}" == "post" ]; then
  /opt/bin/undervolt.sh
fi

That's it, everything after this point is what I went through in testing.

Testing for stability

I stress tested with mprime in stress mode for the CPU and glxmark or gputest for the GPU. I only recorded the results for the CPU, as that's what I care about.

No changes:

  • 15-17W package tdp
  • 15W core tdp
  • 3.06-3.22 Ghz

50mv Undervolt:

  • 15-16W package tdp
  • 14-15W core tdp
  • 3.22-3.43 Ghz

70mv undervolt:

  • 15-16W package tdp
  • 14W core tdp
  • 3.22-3.43 Ghz

90mv undervolt:

  • 15W package tdp
  • 13-14W core tdp
  • 3.32-3.54 Ghz

100mv undervolt:

  • 15W package tdp
  • 13W core tdp
  • 3.42-3.67 Ghz

110mv undervolt:

  • 15W package tdp
  • 13W core tdp
  • 3.42-3.67 Ghz

started getting gfx artifacts, switched the gfx undervolt to 100 here

115mv undervolt:

  • 15W package tdp
  • 13W core tdp
  • 3.48-3.72 Ghz

115mv undervolt with repaste:

  • 15-16W package tdp
  • 14-15W core tdp
  • 3.63-3.81 Ghz

120mv undervolt with repaste:

  • 15-16W package tdp
  • 14-15W core tdp
  • 3.63-3.81 Ghz

I decided on 110mv cpu and 90mv gpu undervolt for stability, with proper ventilation I get about 3.7-3.8 Ghz out of a max of 3.9 Ghz.

Other notes: Undervolting made the cpu max speed less spiky.

January 17, 2018
Sven Vermeulen a.k.a. swift (homepage, bugs)
Structuring a configuration baseline (January 17, 2018, 08:10 UTC)

A good configuration baseline has a readable structure that allows all stakeholders to quickly see if the baseline is complete, as well as find a particular setting regardless of the technology. In this blog post, I'll cover a possible structure of the baseline which attempts to be sufficiently complete and technology agnostic.

If you haven't read the blog post on documenting configuration changes, it might be a good idea to do so as it declares the scope of configuration baselines and why I think XCCDF is a good match for this.

January 08, 2018
Andreas K. Hüttel a.k.a. dilfridge (homepage, bugs)
FOSDEM 2018 talk: Perl in the Physics Lab (January 08, 2018, 17:07 UTC)

FOSDEM 2018, the "Free and Open Source Developers' European Meeting", takes place 3-4 February at Universite Libre de Bruxelles, Campus Solbosch, Brussels - and our measurement control software Lab::Measurement will be presented there in the Perl devrooom! As all of FOSDEM, the talk will also be streamed live and archived; more details on this follow later. Here's the abstract:

Perl in the Physics Lab
Andreas K. Hüttel
    Track: Perl Programming Languages devroom
    Room: K.4.601
    Day: Sunday
    Start: 11:00
    End: 11:40 
Let's visit our university lab. We work on low-temperature nanophysics and transport spectroscopy, typically measuring current through experimental chip structures. That involves cooling and temperature control, dc voltage sources, multimeters, high-frequency sources, superconducting magnets, and a lot more fun equipment. A desktop computer controls the experiment and records and evaluates data.

Some people (like me) want to use Linux, some want to use Windows. Not everyone knows Perl, not everyone has equal programming skills, not everyone knows equally much about the measurement hardware. I'm going to present our solution for this, Lab::Measurement. We implement a layer structure of Perl modules, all the way from the hardware access and the implementation of device-specific command sets to high level measurement control with live plotting and metadata tracking. Current work focuses on a port of the entire stack to Moose, to simplify and improve the code.

FOSDEM

January 07, 2018
Sven Vermeulen a.k.a. swift (homepage, bugs)
Documenting configuration changes (January 07, 2018, 20:20 UTC)

IT teams are continuously under pressure to set up and maintain infrastructure services quickly, efficiently and securely. As an infrastructure architect, my main concerns are related to the manageability of these services and the secure setup. And within those realms, a properly documented configuration setup is in my opinion very crucial.

In this blog post series, I'm going to look into using the Extensible Configuration Checklist Description Format (XCCDF) as the way to document these. This first post is an introduction to XCCDF functionally, and what I position it for.

Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

You may remember that just last week I was excited to announce that I had more work planned and lined up for my glucometer utilities, one of which was supporting OneTouch Verio IQ which is a slightly older meter that is still sold and in use in many countries, but for which no protocol is released.

In the issue I linked above you can find an interesting problem: LifeScan discontinued their Diabetes Management Software, and removed it from their website. Indeed instead they suggest you get one of their Bluetooth meters and to use that with their software. While in general the idea of upgrading a meter is sane, the fact that they decided to discontinue the old software without providing protocols is at the very least annoying.

This shows the importance of having open source tools that can be kept alive as long as needed, because there will be people out there that still rely on their OneTouch Verio IQ, or even on the OneTouch Ultra Easy, which was served by the same software, and is still being sold in the US. Luckily at least they at least used to publish the Ultra Easy protocol specs and they are still available on the Internet at large if you search for them (and I do have a copy, and I can rephrase that into a protocol specification if I find that’s needed).

On the bright side, the Tidepool project (of which I wrote before) has a driver for the Verio IQ. It’s not a particularly good driver, as I found out (I’ll get to that later), but it’s a starting point. It made me notice that the protocol was almost an in-between of the Ultra Easy and the Verio 2015, which I already reverse engineered before.

Of course I also managed to find a copy of the LifeScan software on a mostly shady website and a copy of the “cable drivers” package from the Middle East and Africa website of LifeScan, which still has the website design from five years ago. This is good because the latter package is the one that installs kernel drivers on Windows, while the former only contains userland software, which I can trust a little more.

Comparing the USB trace I got from the software with the commands implemented in the TidePool driver showed me a few interesting bits of information. The first being the first byte of commands on the Verio devices is not actually fixed, but can be chosen between a few, as the Windows software and the TidePool driver used different bytes (and with this I managed to simplify one corner case in the Verio 2015!). The second is that the TidePool driver does not extract all the information it should! In particular the device allows before/after meal marking, but they discard the byte before getting to it. Of course they don’t seem to expose that data even from the Ultra 2 driver so it may be intentional.

A bit more concerning is that they don’t verify that the command returned a success status, but rather discard the first two bytes every time. Thankfully it’s very easy for me to check that.

On the other hand, reading through the TidePool driver (which I have to assume was developed with access to the LifeScan specifications, under NDA) I could identify two flaws in my own code. The first was not realizing the packet format between the UltraEasy and the Verio 2015 was not subtly different as I thought, but it was almost identical, except the link-control byte in both Verio models is not used, and is kept to 0. The second was that I’m not currently correctly dropping out control solutions from the readings of the Verio 2015! I should find a way to get a hold of the control solution for my models in the pharmacy and make sure I get this tested out.

Oh yeah, and the TidePool driver does not do anything to get or set date and time; thankfully the commands were literally the same as in the Verio 2015, so that part was an actual copy-paste of code. I should probably tidy up a bit, but now I would have a two-tier protocol system: the base packet structure is shared between the UltraEasy, Verio IQ and Verio 2015. Some of the commands are shared between UltraEasy and Verio IQ, more of them are shared between the Verio IQ and the Verio 2015.

You can see now why I’ve been disheartened to hear that the development of drivers, even for open source software, is done through closed protocol specifications that cannot be published (or the drivers thoroughly commented). Since TidePool is not actually using all of the information, there is no way to tell what certain bytes of the responses represent. And unless I get access to all the possible variants of the information, I can’t tell how some bytes that to me look like constant should represent. Indeed since the Verio 2015 does not have meal information, I assumed that the values were 32-bit until I got a report of invalid data on another model which shares the same protocol and driver. This is why I am tempted to build “virtual” fakes of these devices with Facedencer to feed variants of the data to the original software and see how it’s represented there.

On the bright side I feel proud of myself (maybe a little too much) for having spent the time to rewrite those two drivers with Construct while at 34C3 and afterwards. If I hadn’t refactored the code before looking at the Verio IQ, I wouldn’t have noticed the similarities so clearly and likely wouldn’t have come to the conclusion it’s a shared similar protocol. And no way I could have copy-pasted between the drivers so easily as I did.

January 06, 2018
Greg KH a.k.a. gregkh (homepage, bugs)
Meltdown and Spectre Linux kernel status (January 06, 2018, 12:36 UTC)

By now, everyone knows that something “big” just got announced regarding computer security. Heck, when the Daily Mail does a report on it , you know something is bad…

Anyway, I’m not going to go into the details about the problems being reported, other than to point you at the wonderfully written Project Zero paper on the issues involved here. They should just give out the 2018 Pwnie award right now, it’s that amazingly good.

If you do want technical details for how we are resolving those issues in the kernel, see the always awesome lwn.net writeup for the details.

Also, here’s a good summary of lots of other postings that includes announcements from various vendors.

As for how this was all handled by the companies involved, well this could be described as a textbook example of how NOT to interact with the Linux kernel community properly. The people and companies involved know what happened, and I’m sure it will all come out eventually, but right now we need to focus on fixing the issues involved, and not pointing blame, no matter how much we want to.

What you can do right now

If your Linux systems are running a normal Linux distribution, go update your kernel. They should all have the updates in them already. And then keep updating them over the next few weeks, we are still working out lots of corner case bugs given that the testing involved here is complex given the huge variety of systems and workloads this affects. If your distro does not have kernel updates, then I strongly suggest changing distros right now.

However there are lots of systems out there that are not running “normal” Linux distributions for various reasons (rumor has it that it is way more than the “traditional” corporate distros). They rely on the LTS kernel updates, or the normal stable kernel updates, or they are in-house franken-kernels. For those people here’s the status of what is going on regarding all of this mess in the upstream kernels you can use.

Meltdown – x86

Right now, Linus’s kernel tree contains all of the fixes we currently know about to handle the Meltdown vulnerability for the x86 architecture. Go enable the CONFIG_PAGE_TABLE_ISOLATION kernel build option, and rebuild and reboot and all should be fine.

However, Linus’s tree is currently at 4.15-rc6 + some outstanding patches. 4.15-rc7 should be out tomorrow, with those outstanding patches to resolve some issues, but most people do not run a -rc kernel in a “normal” environment.

Because of this, the x86 kernel developers have done a wonderful job in their development of the page table isolation code, so much so that the backport to the latest stable kernel, 4.14, has been almost trivial for me to do. This means that the latest 4.14 release (4.14.12 at this moment in time), is what you should be running. 4.14.13 will be out in a few more days, with some additional fixes in it that are needed for some systems that have boot-time problems with 4.14.12 (it’s an obvious problem, if it does not boot, just add the patches now queued up.)

I would personally like to thank Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, Peter Zijlstra, Josh Poimboeuf, Juergen Gross, and Linus Torvalds for all of the work they have done in getting these fixes developed and merged upstream in a form that was so easy for me to consume to allow the stable releases to work properly. Without that effort, I don’t even want to think about what would have happened.

For the older long term stable (LTS) kernels, I have leaned heavily on the wonderful work of Hugh Dickins, Dave Hansen, Jiri Kosina and Borislav Petkov to bring the same functionality to the 4.4 and 4.9 stable kernel trees. I had also had immense help from Guenter Roeck, Kees Cook, Jamie Iles, and many others in tracking down nasty bugs and missing patches. I want to also call out David Woodhouse, Eduardo Valentin, Laura Abbott, and Rik van Riel for their help with the backporting and integration as well, their help was essential in numerous tricky places.

These LTS kernels also have the CONFIG_PAGE_TABLE_ISOLATION build option that should be enabled to get complete protection.

As this backport is very different from the mainline version that is in 4.14 and 4.15, there are different bugs happening, right now we know of some VDSO issues that are getting worked on, and some odd virtual machine setups are reporting strange errors, but those are the minority at the moment, and should not stop you from upgrading at all right now. If you do run into problems with these releases, please let us know on the stable kernel mailing list.

If you rely on any other kernel tree other than 4.4, 4.9, or 4.14 right now, and you do not have a distribution supporting you, you are out of luck. The lack of patches to resolve the Meltdown problem is so minor compared to the hundreds of other known exploits and bugs that your kernel version currently contains. You need to worry about that more than anything else at this moment, and get your systems up to date first.

Also, go yell at the people who forced you to run an obsoleted and insecure kernel version, they are the ones that need to learn that doing so is a totally reckless act.

Meltdown – ARM64

Right now the ARM64 set of patches for the Meltdown issue are not merged into Linus’s tree. They are staged and ready to be merged into 4.16-rc1 once 4.15 is released in a few weeks. Because these patches are not in a released kernel from Linus yet, I can not backport them into the stable kernel releases (hey, we have rules for a reason…)

Due to them not being in a released kernel, if you rely on ARM64 for your systems (i.e. Android), I point you at the Android Common Kernel tree All of the ARM64 fixes have been merged into the 3.18, 4.4, and 4.9 branches as of this point in time.

I would strongly recommend just tracking those branches as more fixes get added over time due to testing and things catch up with what gets merged into the upstream kernel releases over time, especially as I do not know when these patches will land in the stable and LTS kernel releases at this point in time.

For the 4.4 and 4.9 LTS kernels, odds are these patches will never get merged into them, due to the large number of prerequisite patches required. All of those prerequisite patches have been long merged and tested in the android-common kernels, so I think it is a better idea to just rely on those kernel branches instead of the LTS release for ARM systems at this point in time.

Also note, I merge all of the LTS kernel updates into those branches usually within a day or so of being released, so you should be following those branches no matter what, to ensure your ARM systems are up to date and secure.

Spectre

Now things get “interesting”…

Again, if you are running a distro kernel, you might be covered as some of the distros have merged various patches into them that they claim mitigate most of the problems here. I suggest updating and testing for yourself to see if you are worried about this attack vector

For upstream, well, the status is there is no fixes merged into any upstream tree for these types of issues yet. There are numerous patches floating around on the different mailing lists that are proposing solutions for how to resolve them, but they are under heavy development, some of the patch series do not even build or apply to any known trees, the series conflict with each other, and it’s a general mess.

This is due to the fact that the Spectre issues were the last to be addressed by the kernel developers. All of us were working on the Meltdown issue, and we had no real information on exactly what the Spectre problem was at all, and what patches were floating around were in even worse shape than what have been publicly posted.

Because of all of this, it is going to take us in the kernel community a few weeks to resolve these issues and get them merged upstream. The fixes are coming in to various subsystems all over the kernel, and will be collected and released in the stable kernel updates as they are merged, so again, you are best off just staying up to date with either your distribution’s kernel releases, or the LTS and stable kernel releases.

It’s not the best news, I know, but it’s reality. If it’s any consolation, it does not seem that any other operating system has full solutions for these issues either, the whole industry is in the same boat right now, and we just need to wait and let the developers solve the problem as quickly as they can.

The proposed solutions are not trivial, but some of them are amazingly good. The Retpoline post from Paul Turner is an example of some of the new concepts being created to help resolve these issues. This is going to be an area of lots of research over the next years to come up with ways to mitigate the potential problems involved in hardware that wants to try to predict the future before it happens.

Other arches

Right now, I have not seen patches for any other architectures than x86 and arm64. There are rumors of patches floating around in some of the enterprise distributions for some of the other processor types, and hopefully they will surface in the weeks to come to get merged properly upstream. I have no idea when that will happen, if you are dependant on a specific architecture, I suggest asking on the arch-specific mailing list about this to get a straight answer.

Conclusion

Again, update your kernels, don’t delay, and don’t stop. The updates to resolve these problems will be continuing to come for a long period of time. Also, there are still lots of other bugs and security issues being resolved in the stable and LTS kernel releases that are totally independent of these types of issues, so keeping up to date is always a good idea.

Right now, there are a lot of very overworked, grumpy, sleepless, and just generally pissed off kernel developers working as hard as they can to resolve these issues that they themselves did not cause at all. Please be considerate of their situation right now. They need all the love and support and free supply of their favorite beverage that we can provide them to ensure that we all end up with fixed systems as soon as possible.

January 03, 2018
FOSDEM 2018 (January 03, 2018, 00:00 UTC)

FOSDEM 2018 logo

Put on your cow bells and follow the herd of Gentoo developers to Université libre de Bruxelles in Brussels, Belgium. This year FOSDEM 2018 will be held on February 3rd and 4th.

Our developers will be ready to candidly greet all open source enthusiasts at the Gentoo stand in building K. Visit this year’s wiki page to see which developer will be running the stand during the different visitation time slots. So far seven developers have specified their attendance, with most-likely more on the way!

Unlike past years, Gentoo will not be hosting a keysigning party, however participants are encouraged to exchange and verify OpenPGP key data to continue building the Gentoo Web of Trust. See the wiki article for more details.

January 02, 2018
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)

You may remember glucometerutils, my project of an open source Python tool to download glucometer data from meters that do not provide Linux support (as in, any of them).

While the tool started a few years ago, out of my personal need, this year there has been a bigger push than before, with more contributors trying the tool out, finding problem, fixing bugs. From my part, I managed to have a few fits of productivity on the tool, particularly this past week at 34C3, when I decided it was due time to start making the package shine a bit more.

So let’s see what are the more recent developments for the tool.

First of all, I decided to bring up the Python version requirement to Python 3.4 (previously, it was Python 3.2). The reason for this is that it gives access to the mock module for testing, and the enum module to write actual semantically-defined constants. While both of these could be provided as dependencies to support the older versions, I can’t think of any good reason not to upgrade from 3.2 to 3.4, and thus no need to support those versions. I mean, even Debian Stable has Python 3.5.

And while talking about Python versions, Hector pointed me at construct, which looked right away like an awesome library for dealing with binary data structures. It turned out to be a bit more rough around the edges than I’ve expected from the docs, particularly because the docs do not contain enough information to actually use it with proper dynamic objects, but it does make a huge difference compared to dealing with bytestring manually. I have started already while in Leipzig to use it to parse the basic frames of the FreeStyle protocol, and then proceeded to rewrite the other binary-based protocols, between the airports and home.

This may sound like a minor detail, but I actually found this made a huge difference, as the library already provides proper support for validating expected constants, as well as dealing with checksums — although in some cases it’s a bit heavier-handed than I expected. Also, the library supports defining bit structures too, which simplified considerably the OneTouch Ultra Easy driver, that was building its own poor developer version of the same idea. After rewriting the two binary LifeScan drivers I have (for the OneTouch Verio 2015 and Select Plus, and the Ultra Easy/Ultra Mini), the similarities between the two protocols are much easier to spot. Indeed, after porting the second driver, I also decided to refactor a bit on the first, to make the two look more alike.

This is going to be useful soon again, because two people have asked for supporting the OneTouch Verio IQ (which, despite the name, shares nothing with the normal Verio — this one uses an on-board cp210x USB-to-serial adapter), and I somehow expect that while not compatible, the protocol is likely to be similar to the other two. I ended up finding one for cheap on Amazon Germany, and I ended up ordering it — it would be the easier one to reverse engineer from my backlog, because it uses a driver I already known is easy to sniff (unlike other serial adapters that use strange framing, I’m looking at you FT232RL!), and the protocol is likely to not stray too far from the other LifeScan protocols, even though it’s not directly compatible.

I have also spent some time on the tests that are currently present. Unfortunately they don’t currently cover much of anything beside for some common internal libraries. I have though decided to improve the situation, if a bit slowly. First of all, I picked up a few of the recommendations I give my peers at work during Python reviews, and started using the parameterized module that comes from Abseil, which was recently released opensource by Google. This reduces tedious repetition when building similar tests to exercise different paths in the code. Then, I’m very thankful to Muhammad for setting up Travis for me, as that now allows the tests to show breakage, if there is any coverage at all. I’ll try to write more tests this month to make sure to exercise more drivers.

I’ve also managed to make the setup.py in the project more useful. Indeed it now correctly lists dependencies for most of the drivers as extras, and I may even be ready to make a first release on PyPI, now that I tested most of the devices I have at home and they all work. Unfortunately this is currently partly blocked on Python SCSI not having a release on PyPI itself. I’ll get back to that possibly next month at this point. For now you can install it from GitHub and it should all work fine.

As for the future, there are two in-progress pull requests/branches from contributors to add support for graphing the results, one using rrdtool and one using gnuplot. These are particularly interesting for users of FreeStyle Libre (which is the only CGM I have a driver for), and someone expressed interest in adding a Prometheus export, because why not — this is not as silly as it may sound, the output graphs of the Libre’s own software look more like my work monitoring graphs than the usual glucometer graphs. Myself, I am now toying with the idea of mimicking the HTML output that the Accu-Check Mobile generate on its own. This would be the easiest to just send by email to a doctor, and probably can be used as a basis to extend things further, integrating the other graphs output.

So onwards and upwards, the tooling will continue being built on. And I’ll do my best to make sure that Linux users who have a need to download their meters’ readings have at least some tooling that they can use, and that does not require setting up unsafe MongoDB instances on cloud providers.

January 01, 2018
Sebastian Pipping a.k.a. sping (homepage, bugs)

Escaping Docker container using waitid() – CVE-2017-5123 (twistlock.com)

December 30, 2017
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
SMS for two-factors authentication (December 30, 2017, 01:04 UTC)

Having spent now a few days at the 34C3 (I’ll leave comments over that to discussion to happen in front of a very warm coffee with a nice croissant to the side), I have heard a few times already presenters referencing the SS7 hack as an everyday security threat. You probably are not surprised I don’t agree with that being an everyday security issue, while still thinking this is a real security issue.

Indeed, while some websites refer to the SS7 hack as the reason not to use SMS for auth, at least you can find more reasonable articles talking about the updated NIST recommendation. Myself, the preference for TOTP (as it’s used in authenticator apps), is because I don’t need to be registered on a mobile network and I can use it to login while in flights.

I want to share some more thoughts that add up to the list of reasons to not use SMS as the second factor authentication. Because while I do not believe SS7 attack is something that all users should be accounting for in their day-to-day life, it is an addition to the list of reasons why SMS auth is not a good idea at all, and I don’t want people to think that, since they are unlikely to be attacked by someone leveraging SS7, they are okay with using SMS authentication instead.

The obvious first problem is reliability: as I said in the post linked above, SMS-based authentication requires you to have access to the phone with the SIM, and that it’s connected to the network and allowed to receive the SMS. This is fine if you never travel and you have good reception where you need to use this. But myself, I have had multiple situations in which I was unable to receive SMS (in flights as I said already, or while visiting China the first time, with 3 Ireland not giving access to any roaming partners to pay-as-you-go customers, or even at my own desk at my office with 3 UK after I moved to London).

This lack of reliability is unfortunate, but not by itself a security issue: it prevents you from accessing the account that the 2FA is set on, and that would sound like it’s doing what it’s designed to do, failing close. At the same time, this increases the friction of using 2FA, reducing usability, and pushing users, bit by bit, to stop using 2FA altogether. Which is a security problem.

The other problem is the ability to intercept or hijack those messages. As you can guess by what I wrote above, I’m not referring to SS7 hacks or equivalent. It’s all significantly more simple.

The first way to intercept the SMS auth messages is having access to the phone itself. Most phones, iOS and Android alike, are configured to show new text messages in the standby page. In some cases, only a part of the message is visible, and to have access to the rest of the message you’d have to unlock the phone – assuming the phone is locked at all – but 2FA authentication messages tend to be very short and to the point, showing the number in the preview, for ease of access. On Android, such a message can also be swiped away without unlocking the phone. An user that would be victim to this type of interception might have a very hard time noticing this, as nowadays it’s likely the SMS app is not opened very often, and a swiped-off notification would take time to be noticed1.

The hijacking I have in mind is a bit more complicated and (probably) noticeable. Instead of using the SS7 hack, you can just take over a phone number by leveraging the phone providers. And this can be done in (at least) two ways: you can convince the victim’s phone provider to reprovision the number to a new SIM card within the same operator, or you can port the number to a new operator. The complexity or easiness of these two processes change a lot between countries, some countries are safer than others.

For instance, the UK system for number portability is what I expect to be the more secure (if not the most userfriendly) I have seen. The first step is to get a Portability Authorization Code (PAC) for the number you want to port. You do that by calling the current provider. None of the three providers I had up to now in the UK had any way to get this code online, which is a tad safer, as a misplaced password cannot bring full access to the account line. And while the code could be “intercepted” the same way as I pointed out above for authentication codes, the (new) operator does get in touch with you reminding you when the portability will take place, and giving you a chance to contact them if it doesn’t sound right. In the case of Vodafone, they also send you an email when you request the PAC, meaning just swiping away the notification is not enough to hide the fact that it was requested in the first place.

In Ireland, a portability request completes in the span of less than an hour, and only requires you to have (brief) access to the line you want to take over, as the new operator will send you a code to confirm the request. Which means the process, while being significantly easier for the customers, is also extremely insecure. In Italy, I actually went to the store with the line that I wanted to port, and I don’t remember if they asked anything but my IDs to open the new line. No authentication code is involved at all, so if you can fake enough documents, you likely can take over any lines. I do not remember if they notified my old SIM card before the move. I have not tried number portability in France, but it appears you can get the RIO (the equivalent transfer code) from the online system of Free at the very least.

The good thing about all the portability processes I’ve seen up to now is that at least they do not drop a new number on the old SIM card. I was told that this is (or was at least) actually common for US providers, where porting a number out just assign a new number to the old SIM. In that case, it would probably take a while for a victim to notice they had their account taken over. And that would not be a surprise.

If you’re curious, you can probably try that by yourself. Call your phone provider from another number than your own, see how many and which security questions they ask you to identify that it is actually their customer calling, instead of a random stranger. I think the funniest I’ve had was Three Ireland, that asked me for the number I “recently” sent a text message to, or a recent call made or received — you can imagine that it’s extremely easy to force you to get someone to send you a text message, or have them call you, if you’re close enough to them, or even just have them pick up the phone and reporting to the provider that you were last called by the number you used.

And then there is the other interesting point of SMS-based authentication: the codes last longer in time. A TOTP has a short lifetime by design, as it’s time based. Add some fuzzing, most of the implementations I’ve seen allow a single code to be valid for 2× the time the code is displayed on the authenticator, by accepting a code from the past, or from the future, generally at half the expected duration. Since the normal case has the code lasting for 60 seconds, they would accept a code 30 seconds before it’s supposed to be used, and 30 seconds after. But text messages can (and do) take much longer than that.

And this is particularly useful for attackers of systems that do not implement 2FA correctly. Entering the wrong OTP most of the time does not invalidate a login attempt, at least not on the first mistake, because users can mistype, or they can miss the window to send an OTP over. But sometimes there are egregious errors, such as that made by N26, where they neither ratelimited, nor invalidated the OTP requests, allowing a simple bruteforcing of a valid OTP. Since TOTP change without requesting a new code, bruteforcing those give you a particularly short time span of viability… SMS on the other hand, open for a much larger window of opportunity.

Oh and remember the Barclays single-factor-authentication? How long do you think it would take to spoof the outgoing number of the SMS that needs to be sent (with the text “Y”) to authorize the transaction, even without having access to the text message that was sent?


  1. This is an argument for using another of your messaging apps to receive text messages, whether it is Facebook Messenger, Hangouts or Signal. Assuming you use any of those on a day-to-day basis, you would then have an easy way to notice if you received messages you have not seen before. [return]

December 28, 2017
Michael Palimaka a.k.a. kensington (homepage, bugs)
Helper scripts for CMake-based ebuilds (December 28, 2017, 14:13 UTC)

I recently pushed two new scripts to the KDE overlay to help with CMake-based ebuild creation and maintenance.

cmake_dep_check inspects all CMakeLists.txt files under the current directory and maps find_package calls to Gentoo packages.

cmake_ebuild_check builds on cmake_dep_check and produces a list of CMake dependencies that are not present in the ebuild (and optionally vice versa).

If you find it useful or run into any issues feel free to drop me a line.

December 24, 2017
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
Barclays and the single factor authentication (December 24, 2017, 21:04 UTC)

In my previous post on the topic I have barely touched on one of the important reasons why I did not like Barclays at all. The reason for that was that I still had money into my account with them, and I wanted to make sure that was taken care of before lamenting further on the state of their security. As I managed to close my account now, I should go on and discuss this further, even though I have touched upon the major topics of this.

Barclays online banking system relies heavily on what I would define as “single factor authentication”.

Usually, you define authentication factors as things you have or things you know. In the case of Barclays, the only thing they effectively rely upon is “access to the debit card”. Okay, technically you could say that by itself it’s a two-factor system, as it requires access to the debit card and to its PIN. And since the EMV-CAP protocol they use for this factor executes directly on the chipcard, it is not susceptible to the usual PIN-stripping attacks as most card fraud with chip-and-pin cards uses.

But this does not count for much when the PIN of the card they issued me was 7766 — and to lament of that is why I waited to close the account and give them back the card. It seems like there’s a pattern of banks issuing “easy to remember” 4-digit PINs: XYYX, XXYY, etc. One of my previous (again, cancelled) cards had a PIN terribly easy to remember for a computerist, at least not for the average person though: 0016.

Side note: I have read someone suggesting to badly scribbled a wrong PIN on the back of a card as a theft prevention. Though I like that idea, I’m just afraid the banks won’t like it anyway. Also it would take some work to make the scribble being easily misunderstood for different digits so that they can try the three times needed to block it.

You access Barclays online banking account through the use of the Identify method provided by CAP, which means you put the card into the reader, provide the PIN, and you get an 8-digits identifier that can be used to login on the website. Since I’m no expert of how CAP works internally, I will only venture a guess that this is similar to a counter-based OTP, as the card has no access to a real-time clock, and there is no challenge provided for this information.

This account access sounds secure, but it’s really not any more secure than an username and password, at least when it comes to dealing with phishing. You may think that producing a façade that shows the full Barclays login, and proxies the responses in real time is a lot of work, but the phishing tools are known for being flexible, and they don’t really need to reproduce the whole website, just the parts they care about getting data from. The rest can easily be proxied as it is without any further change, of course.

So what can we do once you can fool someone into logging in to the bank? Well, you can’t really do much, as most of the actions require further CAP confirmation: wires, new standing orders, and so on so forth. You can, though, get a lot of information about the victim, including enough proofs of address or identity that you can really mess with their life. It also makes it possible to cancel things like standing orders to pay for rent, which would be quite messy to deal with for most people — although most of the phishing is not done for the purpose of messing with people, and more to get their money.

As I said, for sending money you need to have access to the CAP codes. That includes having access not only to the device itself, but also the card and the PIN. To execute those transactions, Barclays will ask you to sign a transaction by providing the CAP device with the account number and the amount to wire. This is good and it’s pretty hard to tamper with, hopefully (I do not make any guarantee on the implementation of CAP), so even if you’re acting through a proxy-phishing site, your wires are probably safe.

I say probably, because the way the challenge-response is implemented, only the 8-digits account number is used during the signature. If the phishers are attacking a victim that they studied for long enough, which may be the case when attacking businesses, you could know which account they pay every month manually, and set up an account with the same number at a different bank (different sort code). The signature would be valid for both.

To be fair to Barclays, implementing the CAP fully, the way they did here, is actually more secure than what Ulster Bank (and I assume the rest of RBS Group) does, with an opaque “challenge” token. While this may encode more information, the fact that it’s opaque means there is no way for the user to know whether what they are signing is indeed what they meant to.

Now, these mitigations are actually good. They require continuous access to the card on request, and that makes it very hard for phishing to just keep using the site in the background after the user logged in. But they still rely on effectively a single factor. If someone gets a hold of the card and the PIN (and we know at least some people will write the real one on the back of the card), then it’s game over: it’s like the locks on my flat’s door: two independent locks… except they use the same key. Sure, it’s a waste of time to pick both, so it increases the chances a neighbour would walk in on wannabe burglars trying to open the apartment door. But there’s a single key, I can’t just use two separate keychains to make sure a thief would only grab one of the two, and if anyone gets it from me, well, it’s game over.

Of course Barclays knows that this is not enough, so they include a risk engine. If something in the transactions don’t comply with their profile of your activity, it’s considered risky and they require an additional verification. This verification happens to be in form of text messages. I will not suggest that the problem with these is with GSM-layer attacks, as that is still not (yet) in the hands of the type of criminals aiming at personal bank accounts, but there is at the very least the risk that a thieve would get a handle of my bag with both my card and my phone, so the only “factors” that are still in my head, rather than tied to the physical objects, are the (provided) PIN of the card, and the PIN of the phone.

This profile fitting is actually the main reason why I got frustrated with Barclays: since I had just opened the account, most of the transactions were all “exceptional”, and that is extremely annoying. This was compounded by the fact that my phone provider didn’t even let me receive SMS from the office, due to lack of coverage (now fixed), and the fact that at least for wires, the Barclays UI does not warn you to check your phone!

There is also the problem with the way Barclays handle these “exceptional transactions”: debit card transactions are out-and-out rejected. The Verified by Visa screen tells you to check your phone, but the phone will only ask you if it was your transaction or not, and after you confirm it is, it’ll ask you to “retry in a couple of minutes” — retrying too quickly will lead to the transactions being blocked by the processor directly, with a temporary card lock. The wire transfer one will unblock the execution of the wire, which is good, but it can also push the wire to after the cut-off time for non-“Faster Payments” wires.

Update (2017-12-30): since I did not make this very clear, I have added a note about this at the bottom of my new post, about the fact hat confirming these transactions only need you to spoof the sender, since the content and destination of the text message to send are known (it only has to say “Y”, and it’s always to the same service number). So this validation should not really count as a second factor authentication for a skilled attacker.

These are all the reasons for which I abandoned Barclays as fast as I could. Some of those are actually decent mitigation strategies, but the fact that they do not really increase security, while increasing inconvenience, makes me doubt the validity of their concerns and threat models.

December 23, 2017
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
GCC bit shift handling on ia64 final part (December 23, 2017, 00:00 UTC)

trofi's blog: GCC bit shift handling on ia64 final part

GCC bit shift handling on ia64 final part

In my previous post I’ve tracked down gcc RTL optimisation bug around instruction combiner but did not get to very bottom of it. gcc bug report advanced a bit and a few things got clearer.

To restate the problem:

Posted on December 23, 2017
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

GCC bit shift handling on ia64 (December 23, 2017, 00:00 UTC)

trofi's blog: GCC bit shift handling on ia64

GCC bit shift handling on ia64

Over the past week I’ve spent quite some time poking at toolchain packages in Gentoo and fixing fresh batch of bugs. The bugs popped up after recent introduction of gcc profiles that enable PIE (position independent executables) and SSP (stack smash protected executables) by default in Gentoo. One would imagine SSP is more invasive as it changes on-stack layout of things. But somehow PIE yields bugs in every direction you throw it at:

  • [not really from last week] i386 static binaries SIGSEGV at startup: bug (TL;DR: calling vdso-implemented SYSENTER syscall entry point needs TLS to be already initialised in glibc. In this case brk() syscall was called before TLS was initialised.)
  • ia64’s glibc miscompiles librt.so: bug (Tl;DR: pie-by-default gcc tricked glibc into believing binutils supports IFUNC on ia64. It does not, exactly as in sparc bug 7 years ago.)
  • sparc’s glibc produces broken static binaries on pie-by-default gcc: bug. (TL;DR: C startup files for sparc were PIC-unfriendly)
  • powerpc’s gcc-7.2.0 and older produces broken binaries: bug (TL;DR: two bugs: binutils generates relocations to non-existent symbols, gcc hardcoded wrong C startup files to be used)
  • mips64 gcc fails to produce glibc that binutils can link without assert() failures (TODO: debug and fix)
  • m68k glibc fails to compile (TODO: debug and fix)
  • sh4 gcc fails to compile by crashing with stack smash detection (TODO: debug and fix)
  • more I forgot about

These are only toolchain bugs, and I did not yet try things on arm. I’m sure other packages will uncover more interesting corner cases once we fix the toolchain.

I won’t write about any of the above here but it’s a nice reference point if you want to backport something to older toolchains to get PIE/SSP work better. The bugs might be a good reference point on how to debug that kind of bugs.

Today’s story will be about gcc code generation bug on ia64 that has nothing to do with PIE or SSP.

Bug

On #gentoo-ia64 Jason Duerstock asked if openssl fails tests on modern gcc-7.2.0. It used to pass on gcc-6.4.0 for him.

Having successfully dealt with IFUNC bug recently I was confident things are mostly working in ia64 land and gave it a try.

openssl DES tests were indeed failing, claiming that encryption/decryption roundtrip did not yield original data with errors like:

des_ede3_cbc_encrypt decrypt error ...

What does DES test do?

The test lives at https://github.com/openssl/openssl/blob/OpenSSL_1_0_2-stable/crypto/des/destest.c#L421 and looks like that (simplified):

printf("Doing cbcm\n");
DES_set_key_checked(&cbc_key, &ks);
DES_set_key_checked(&cbc2_key, &ks2);
DES_set_key_checked(&cbc3_key, &ks3);
i = strlen((char *)cbc_data) + 1;
DES_ede3_cbcm_encrypt(cbc_data, cbc_out, 16L, &ks, &ks2, &ks3, &iv3, &iv2, DES_ENCRYPT);
DES_ede3_cbcm_encrypt(&cbc_data[16], &cbc_out[16], i - 16, &ks, &ks2, &ks3, &iv3, &iv2, DES_ENCRYPT);
DES_ede3_cbcm_encrypt(cbc_out, cbc_in, i, &ks, &ks2, &ks3, &iv3, &iv2, DES_DECRYPT);
if (memcmp(cbc_in, cbc_data, strlen((char *)cbc_data) + 1) != 0) {
printf("des_ede3_cbcm_encrypt decrypt error\n");
// ...
}

Test encrypts a few bytes of data in two takes (1: encrypt 16 bytes, 2: encrypt rest) and then decrypts the result in one go. It works on static data. Very straightforward.

Fixing variables

My first suspect was gcc change as 7.2.0 fails, 6.4.0 works and I tried to build openssl with optimisations completely disabled:

# CFLAGS=-O1 FEATURES=test emerge -1 =dev-libs/openssl-1.0.2n
...
des_ede3_cbc_encrypt decrypt error ...

# CFLAGS=-O0 FEATURES=test emerge -1 =dev-libs/openssl-1.0.2n
...
OK!

Hah! At least -O0 seems to work. It means it will be easier to cross-check which optimisation pass affects code generation and find out if it’s an openssl bug or something else.

Setting up A/B test

Debugging cryptographic algorithms like DES is very easy from this standpoint because they don’t need any external state: no services running, no files created, input data is not randmized. They are a pure function of input bit stream(s).

I unpacked single openssl source tree and started building it in two directories: one with -O0 optimisations, another with -O1:

~/openssl/openssl-1.0.2n-O0/
    `openssl-1.0.2n-.ia64/
        `crypto/des/destest.c (file-1)
        ...
~/openssl/openssl-1.0.2n-O0/
    `openssl-1.0.2n-.ia64/
        `crypto/des/destest.c (symlink to file-1)
        ...

Then I started sprinking printf() statements in destest.c and other crypto/des/ files, then ran destest and diffed text stdouts to find where exactly difference appears first.

Relatively quickly I nailed it down to the following trace:

unsigned int u;
...
# define D_ENCRYPT(LL,R,S) {\
LOAD_DATA_tmp(R,S,u,t,E0,E1); \
t=ROTATE(t,4); \
LL^=\
DES_SPtrans[0][(u>> 2L)&0x3f]^ \
DES_SPtrans[2][(u>>10L)&0x3f]^ \
DES_SPtrans[4][(u>>18L)&0x3f]^ \
DES_SPtrans[6][(u>>26L)&0x3f]^ \
DES_SPtrans[1][(t>> 2L)&0x3f]^ \
DES_SPtrans[3][(t>>10L)&0x3f]^ \
DES_SPtrans[5][(t>>18L)&0x3f]^ \
DES_SPtrans[7][(t>>26L)&0x3f]; }

See an error? Me neither.

Minimizing test

printf() debugging suggested DES_SPtrans[6][(u>>26L)&0x3f] returns different data in -O0 and -O1 cases. Namely the following expression did change:

  • -O0: (u>>26L)&0x3f yielded 0x33
  • -O1: (u>>26L)&0x3f yielded 0x173

Note how it’s logically infeasible to get anything more than 0x3f from that expression. And yet here we are with our 0x173 value.

I spent some time deleting lines one by one from all the macros as long as the result kept producing the diff. Removing lines is safe in most of DES code because all the test does is flipping a few bits and rotating them within a single unsigned int u local variable.

After a while I came up with the following minimal reproducer:

#include <stdio.h>
typedef unsigned int u32;
u32 bug (u32 * result) __attribute__((noinline));
u32 bug (u32 * result)
{
// non-static and volatile to inhibit constant propagation
volatile u32 ss = 0xFFFFffff;
volatile u32 d = 0xEEEEeeee;
u32 tt = d & 0x00800000;
u32 r = tt << 8;
// rotate
r = (r >> 31)
| (r << 1);
u32 u = r^ss;
u32 off = u >> 1;
// seemingly unrelated but bug-triggering side-effect
*result = tt;
return off;
}
int main() {
u32 l;
u32 off = bug(&l);
printf ("off>>: %08x\n", off);
return 0;
}
$ gcc -O0 a.c -o a-O0 && ./a-O0 > o0
$ gcc -O1 a.c -o a-O1 && ./a-O1 > o1
$ diff -U0 o0 o1

-off>>: 7fffffff
+off>>: ffffffff

The test itself is very straightforward: it does only 32-bit arightmetics on unsigned int r and prints the result. This is a very fragile test: if you try to remove seemingly unrelated code like *result = tt; the bug will disappear.

What gcc actually does

I tried to look at the assembly code. If I could spot an obvious problem I could inspect intermediate gcc steps to get the idea which pass precisely introduces wrong resulting code. Despite being Itanium the code is not that complicated (added detailed comments):

Dump of assembler code for function bug:
mov r14=r12 ; r12: register holding stack pointer
; r14 = r12 (new temporary variable)
mov r15=-1 ; r15=0xFFFFffff ('ss' variable)
st4.rel [r14]=r15,4 ; write 'ss' on stack address 'r14 - 4'
movl r15=0xffffffffeeeeeeee ; r15=0xEEEEeeee ('d' variable)
st4.rel [r14]=r15 ; write 'd' on stack address 'r14'
ld4.acq r15=[r14] ; and quickly read 'd' back into 'r15' :)
movl r14=0x800000 ; r14=0x00800000
and r15=r14,r15 ; u32 tt = d & 0x00800000;
; doing u32 r = tt << 8;
dep.z r14=r15,8,24 ; "deposit" 24 bits from r14 into 15
; starting at offset 8 and zeroing the rest.
; Or in pictures (with offsets):
; r14 = 0xAABBCCDD11223344
; r15 = 0x0000000022334400
ld4.acq r8=[r12] ; read 'ss' into r8
st4 [r32]=r15 ; *result = tt
; // rotate
; r = (r >> 31)
; | (r << 1);
mix4.r r14=r14,r14 ; This one is tricky: mix duplicates lower32
; bits into lower and upper 32 bits of 14.
; Or in pictures:
; r14 = 0xAABBCCDDF1223344 ->
; r14 = 0xF1223344F1223344
shr.u r14=r14,31 ; And shift right for 31 bit (with zero padding)
; r14 = 0x00000001E2446688
; Note how '1' is in a position of 33-th bit
xor r8=r8,r14 ; u32 u = r^ss;
extr.u r8=r8,1,32 ; u32 off = u >> 1
; "extract" 32 bits at offset 1 from r8 and put them to r8
; Or in pictures:
; r8 = 0x0000000100000000 -> (note how all lower 32 bits are 0)
; r8 = 0x0000000080000000
br.ret.sptk.many b0 ; return r8 as a computation result

Tl;DR: extr.u r8=r8,1,32 extracts 32 bits from 64-bit register at offset 1 from r8 and puts them into r8 back. The problem is that for u32 off = u >> 1 to work correctly it should extract only 31 bit, not 32.

If I patch assembly file to contain extr.u r8=r8,1,31 the sample will work correctly.

At this point I shared my sample with Jason and filed a gcc bug as it was clear that compiler does something very unexpected. Jason pulled in James and they produced a working gcc patch for me while I was having dinner(!) :)

gcc passes

What surprised me is the fact that gcc recognised “rotate” pattern and used clever mix4.r/shr.u trick to achieve the bit rotation effect.

Internally gcc has a few frameworks to perform optimisations:

  • high-level (relatively) targed independent “tree” optimisations (working on GIMPLE representation)
  • low-level target-specific “register” optimisations (working on RTL representation)

GIMPLE passes run before RTL passes. Let’s check how many passes are being ran on our small sample by using -fdump-tree-all and -fdump-rtl-all:

$ ia64-unknown-linux-gnu-gcc -O1 -fdump-tree-all -fdump-rtl-all -S a.c && ls -1 | nl
1   a.c
2   a.c.001t.tu
3   a.c.002t.class
4   a.c.003t.original
5   a.c.004t.gimple
6   a.c.006t.omplower
7   a.c.007t.lower
8   a.c.010t.eh
9   a.c.011t.cfg
10  a.c.012t.ompexp
11  a.c.019t.fixup_cfg1
12  a.c.020t.ssa
13  a.c.022t.nothrow
14  a.c.027t.fixup_cfg3
15  a.c.028t.inline_param1
16  a.c.029t.einline
17  a.c.030t.early_optimizations
18  a.c.031t.objsz1
19  a.c.032t.ccp1
20  a.c.033t.forwprop1
21  a.c.034t.ethread
22  a.c.035t.esra
23  a.c.036t.ealias
24  a.c.037t.fre1
25  a.c.039t.mergephi1
26  a.c.040t.dse1
27  a.c.041t.cddce1
28  a.c.046t.profile_estimate
29  a.c.047t.local-pure-const1
30  a.c.049t.release_ssa
31  a.c.050t.inline_param2
32  a.c.086t.fixup_cfg4
33  a.c.091t.ccp2
34  a.c.094t.backprop
35  a.c.095t.phiprop
36  a.c.096t.forwprop2
37  a.c.097t.objsz2
38  a.c.098t.alias
39  a.c.099t.retslot
40  a.c.100t.fre3
41  a.c.101t.mergephi2
42  a.c.105t.dce2
43  a.c.106t.stdarg
44  a.c.107t.cdce
45  a.c.108t.cselim
46  a.c.109t.copyprop1
47  a.c.110t.ifcombine
48  a.c.111t.mergephi3
49  a.c.112t.phiopt1
50  a.c.114t.ch2
51  a.c.115t.cplxlower1
52  a.c.116t.sra
53  a.c.118t.dom2
54  a.c.120t.phicprop1
55  a.c.121t.dse2
56  a.c.122t.reassoc1
57  a.c.123t.dce3
58  a.c.124t.forwprop3
59  a.c.125t.phiopt2
60  a.c.126t.ccp3
61  a.c.127t.sincos
62  a.c.129t.laddress
63  a.c.130t.lim2
64  a.c.131t.crited1
65  a.c.134t.sink
66  a.c.138t.dce4
67  a.c.139t.fix_loops
68  a.c.167t.no_loop
69  a.c.170t.veclower21
70  a.c.172t.printf-return-value2
71  a.c.173t.reassoc2
72  a.c.174t.slsr
73  a.c.178t.dom3
74  a.c.182t.phicprop2
75  a.c.183t.dse3
76  a.c.184t.cddce3
77  a.c.185t.forwprop4
78  a.c.186t.phiopt3
79  a.c.187t.fab1
80  a.c.191t.dce7
81  a.c.192t.crited2
82  a.c.194t.uncprop1
83  a.c.195t.local-pure-const2
84  a.c.226t.nrv
85  a.c.227t.optimized
86  a.c.229r.expand
87  a.c.230r.vregs
88  a.c.231r.into_cfglayout
89  a.c.232r.jump
90  a.c.233r.subreg1
91  a.c.234r.dfinit
92  a.c.235r.cse1
93  a.c.236r.fwprop1
94  a.c.243r.ce1
95  a.c.244r.reginfo
96  a.c.245r.loop2
97  a.c.246r.loop2_init
98  a.c.247r.loop2_invariant
99  a.c.249r.loop2_doloop
100 a.c.250r.loop2_done
101 a.c.254r.dse1
102 a.c.255r.fwprop2
103 a.c.256r.auto_inc_dec
104 a.c.257r.init-regs
105 a.c.259r.combine
106 a.c.260r.ce2
107 a.c.262r.outof_cfglayout
108 a.c.263r.split1
109 a.c.264r.subreg2
110 a.c.267r.asmcons
111 a.c.271r.ira
112 a.c.272r.reload
113 a.c.273r.postreload
114 a.c.275r.split2
115 a.c.279r.pro_and_epilogue
116 a.c.280r.dse2
117 a.c.282r.jump2
118 a.c.286r.ce3
119 a.c.288r.cprop_hardreg
120 a.c.289r.rtl_dce
121 a.c.290r.bbro
122 a.c.296r.alignments
123 a.c.298r.mach
124 a.c.299r.barriers
125 a.c.303r.shorten
126 a.c.304r.nothrow
127 a.c.306r.final
128 a.c.307r.dfinish
129 a.c.308t.statistics
130 a.s

128 passes! Quite a few. t letter after pass number means “tree” pass, r is an “rtl” pass. Note how all “tree” passes precede “rtl” ones.

Latest “tree” pass outputs the following:

;; cat a.c.227t.optimized
;; Function bug (bug, funcdef_no=23, decl_uid=2078, cgraph_uid=23, symbol_order=23)

__attribute__((noinline))
bug (u32 * result)
{
  u32 off;
  u32 u;
  u32 r;
  u32 tt;
  volatile u32 d;
  volatile u32 ss;
  unsigned int d.0_1;
  unsigned int ss.1_2;

  <bb 2> [100.00%]:
  ss ={v} 4294967295;
  d ={v} 4008636142;
  d.0_1 ={v} d;
  tt_6 = d.0_1 & 8388608;
  r_7 = tt_6 << 8;
  r_8 = r_7 r>> 31;
  ss.1_2 ={v} ss;
  u_9 = ss.1_2 ^ r_8;
  off_10 = u_9 >> 1;
  *result_11(D) = tt_6;
  return off_10;

}



;; Function main (main, funcdef_no=24, decl_uid=2088, cgraph_uid=24, symbol_order=24) (executed once)

main ()
{
  u32 off;
  u32 l;

  <bb 2> [100.00%]:
  off_3 = bug (&l);
  __printf_chk (1, "off>>: %08x\n", off_3);
  l ={v} {CLOBBER};
  return 0;

}

Code does not differ much from originally written code because I specifically tried to write something that won’t trigger any high-level transformations.

RTL dumps are even more verbose. I’ll show an incomplete snippet of bug() function in a.c.229r.expand file:

;;
;; Full RTL generated for this function:
;;
(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v/f:DI 347 [ result ])
(reg:DI 112 in0 [ result ])) "a.c":5 -1
(nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SI 348)
(const_int -1 [0xffffffffffffffff])) "a.c":7 -1
(nil))
(insn 7 6 8 2 (set (mem/v/c:SI (reg/f:DI 335 virtual-stack-vars) [1 ss+0 S4 A128])
(reg:SI 348)) "a.c":7 -1
(nil))
(insn 8 7 9 2 (set (reg:DI 349)
(reg/f:DI 335 virtual-stack-vars)) "a.c":8 -1
(nil))
(insn 9 8 10 2 (set (reg/f:DI 350)
(plus:DI (reg/f:DI 335 virtual-stack-vars)
(const_int 4 [0x4]))) "a.c":8 -1
(nil))
(insn 10 9 11 2 (set (reg:SI 351)
(const_int -286331154 [0xffffffffeeeeeeee])) "a.c":8 -1
(nil))
(insn 11 10 12 2 (set (mem/v/c:SI (reg/f:DI 350) [1 d+0 S4 A32])
(reg:SI 351)) "a.c":8 -1
(nil))
(insn 12 11 13 2 (set (reg:DI 352)
(reg/f:DI 335 virtual-stack-vars)) "a.c":9 -1
(nil))
(insn 13 12 14 2 (set (reg/f:DI 353)
(plus:DI (reg/f:DI 335 virtual-stack-vars)
(const_int 4 [0x4]))) "a.c":9 -1
(nil))
(insn 14 13 15 2 (set (reg:SI 340 [ d.0_1 ])
(mem/v/c:SI (reg/f:DI 353) [1 d+0 S4 A32])) "a.c":9 -1
(nil))
(insn 15 14 16 2 (set (reg:DI 355)
(const_int 8388608 [0x800000])) "a.c":9 -1
(nil))
(insn 16 15 17 2 (set (reg:DI 354)
(and:DI (subreg:DI (reg:SI 340 [ d.0_1 ]) 0)
(reg:DI 355))) "a.c":9 -1
(nil))
(insn 17 16 18 2 (set (reg/v:SI 342 [ tt ])
(subreg:SI (reg:DI 354) 0)) "a.c":9 -1
(nil))
(insn 18 17 19 2 (set (reg/v:SI 343 [ r ])
(ashift:SI (reg/v:SI 342 [ tt ])
(const_int 8 [0x8]))) "a.c":10 -1
(nil))
(insn 19 18 20 2 (set (reg:SI 341 [ ss.1_2 ])
(mem/v/c:SI (reg/f:DI 335 virtual-stack-vars) [1 ss+0 S4 A128])) "a.c":16 -1
(nil))
(insn 20 19 21 2 (set (mem:SI (reg/v/f:DI 347 [ result ]) [1 *result_11(D)+0 S4 A32])
(reg/v:SI 342 [ tt ])) "a.c":20 -1
(nil))
(insn 21 20 22 2 (set (reg:SI 357 [ r ])
(rotate:SI (reg/v:SI 343 [ r ])
(const_int 1 [0x1]))) "a.c":13 -1
(nil))
(insn 22 21 23 2 (set (reg:DI 358)
(xor:DI (subreg:DI (reg:SI 357 [ r ]) 0)
(subreg:DI (reg:SI 341 [ ss.1_2 ]) 0))) "a.c":16 -1
(nil))
(insn 23 22 24 2 (set (reg:DI 359)
(zero_extract:DI (reg:DI 358)
(const_int 31 [0x1f])
(const_int 1 [0x1]))) "a.c":17 -1
(nil))
(insn 24 23 25 2 (set (subreg:DI (reg:SI 356 [ off ]) 0)
(reg:DI 359)) "a.c":17 -1
(nil))
(insn 25 24 29 2 (set (reg/v:SI 346 [ <retval> ])
(reg:SI 356 [ off ])) "a.c":21 -1
(nil))
(insn 29 25 30 2 (set (reg/i:SI 8 r8)
(reg/v:SI 346 [ <retval> ])) "a.c":22 -1
(nil))
(insn 30 29 0 2 (use (reg/i:SI 8 r8)) "a.c":22 -1
(nil))

The above is RTL representation of our bug() function. S-expressions look like machine instructions but not quite.

For example the following snippet (single S-expression) introduces new virtual register 351 which should receive literal value of “0xeeeeeeee”. SI means SImode, or 32-bit signed integer:

More on modes is here.

(insn 10 9 11 2 (set (reg:SI 351)
(const_int -286331154 [0xffffffffeeeeeeee])) "a.c":8 -1
(nil))

Or just r351 = 0xeeeeeeee :) Note it also mentions source file line numbers. Useful when mapping RTL logs back to source files (and I guess gcc uses the same to emit dwarf debugging sections).

Another example of RTL instruction:

(insn 22 21 23 2 (set (reg:DI 358)
(xor:DI (subreg:DI (reg:SI 357 [ r ]) 0)
(subreg:DI (reg:SI 341 [ ss.1_2 ]) 0))) "a.c":16 -1
(nil))

Here we see an RTL equivalent of u_9 = ss.1_2 ^ r_8; code (GIMPLE). There is more subtlety here: xor itself operates on 64-bit integers (DI) that contain 32-bit values in lower 32-bits (subregs) which I don’t really understand.

Upstream bug attempts to decide which assumption is violated in generic RTL optimisation pass (or backend implementation). At the time of writing this post a few candidate patches were posted to address the bug.

gcc RTL optimisations

I was interested in the optimisation that converted extr.u r8=r8,1,31 (valid) to extr.u r8=r8,1,32 (invalid).

It’s GIMPLE representation is:

// ...
off_10 = u_9 >> 1;
*result_11(D) = tt_6;
return off_10;

Let’s try to find RTL representation of this construct in the very first RTL dump (a.c.229r.expand file):

(insn 23 22 24 2 (set (reg:DI 359)
(zero_extract:DI (reg:DI 358)
(const_int 31 [0x1f])
(const_int 1 [0x1]))) "a.c":17 -1
(nil))
;; ...
(insn 24 23 25 2 (set (subreg:DI (reg:SI 356 [ off ]) 0)
(reg:DI 359)) "a.c":17 -1
(nil))
(insn 25 24 29 2 (set (reg/v:SI 346 [ <retval> ])
(reg:SI 356 [ off ])) "a.c":21 -1
(nil))
(insn 29 25 30 2 (set (reg/i:SI 8 r8)
(reg/v:SI 346 [ <retval> ])) "a.c":22 -1
(nil))
(insn 30 29 0 2 (use (reg/i:SI 8 r8)) "a.c":22 -1
(nil))

Or in a slightly more concise form:

reg359 = zero_extract(reg358, offset=1, length=31)
reg356 = (u32)reg359;
reg346 = (u32)reg356;
reg8   = (u32)reg346;

Very straightforward (and still correct). zero_extract is some RTL virtual operation that will have to be expressed as real instruction at some point.

In the absence of other optimisations it’s done with the following rule (at https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/ia64/ia64.md;h=b7cd52ba366ba3d63e98479df5f4be44ffd17ca6;hb=HEAD#l1381)

(define_insn "extzv"
[(set (match_operand:DI 0 "gr_register_operand" "=r")
(zero_extract:DI (match_operand:DI 1 "gr_register_operand" "r")
(match_operand:DI 2 "extr_len_operand" "n")
(match_operand:DI 3 "shift_count_operand" "M")))]
""
"extr.u %0 = %1, %3, %2"
[(set_attr "itanium_class" "ishf")])

In unoptimised case we see all the intermediate assignments are stored at r14 address and loaded back.

;; gcc -O0 a.c
...
extr.u r15 = r15, 1, 31
;;
st4 [r14] = r15
mov r14 = r2
;;
ld8 r14 = [r14]
adds r16 = -32, r2
;;
ld4 r15 = [r16]
;;
st4 [r14] = r15
adds r14 = -20, r2
;;
ld4 r14 = [r14]
;;
mov r8 = r14
.restore sp
mov r12 = r2
br.ret.sptk.many b0

To get rid of all the needless operations RTL applies extensive list of optimisations. Each RTL dump contains summary of the pass effect on the file. Let’s look at the a.c.232r.jump:

Deleted 2 trivially dead insns
3 basic blocks, 2 edges.

Around a.c.257r.init-regs pass our RTL representation shrinks into the following:

(insn 23 22 24 2 (set (reg:DI 359)
(zero_extract:DI (reg:DI 358)
(const_int 31 [0x1f])
(const_int 1 [0x1]))) "a.c":17 159 {extzv}
(expr_list:REG_DEAD (reg:DI 358)
(nil)))
(insn 24 23 29 2 (set (subreg:DI (reg:SI 356 [ off ]) 0)
(reg:DI 359)) "a.c":17 6 {movdi_internal}
(expr_list:REG_DEAD (reg:DI 359)
(nil)))
(insn 29 24 30 2 (set (reg/i:SI 8 r8)
(reg:SI 356 [ off ])) "a.c":22 5 {movsi_internal}
(expr_list:REG_DEAD (reg:SI 356 [ off ])
(nil)))
(insn 30 29 0 2 (use (reg/i:SI 8 r8)) "a.c":22 -1
(nil))

Or in a slightly more concise form:

reg359 = zero_extract(reg358, offset=1, length=31)
reg356 = (u32)reg359;
reg8   = (u32)reg356;

Note how reg346 = (u32)reg356; already gone away.

The most magic happened in a single a.c.259r.combine pass. Here is it’s report:

;; Function bug (bug, funcdef_no=23, decl_uid=2078, cgraph_uid=23, symbol_order=23)

starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
insn_cost 2: 4
insn_cost 6: 4
insn_cost 32: 4
insn_cost 7: 4
insn_cost 10: 4
insn_cost 11: 4
insn_cost 14: 4
insn_cost 15: 4
insn_cost 16: 4
insn_cost 18: 4
insn_cost 19: 4
insn_cost 20: 4
insn_cost 21: 4
insn_cost 22: 4
insn_cost 23: 4
insn_cost 24: 4
insn_cost 29: 4
insn_cost 30: 0
allowing combination of insns 2 and 20
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 2.
modifying insn i3    20: [in0:DI]=r354:DI#0
      REG_DEAD in0:DI
      REG_DEAD r354:DI
deferring rescan insn with uid = 20.
allowing combination of insns 23 and 24
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 23.
modifying insn i3    24: r356:SI#0=r358:DI 0>>0x1
  REG_DEAD r358:DI
deferring rescan insn with uid = 24.
allowing combination of insns 24 and 29
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 24.
modifying insn i3    29: r8:DI=zero_extract(r358:DI,0x20,0x1)
      REG_DEAD r358:DI
deferring rescan insn with uid = 29.
starting the processing of deferred insns
rescanning insn with uid = 20.
rescanning insn with uid = 29.
ending the processing of deferred insns

The essential output relevant to our instruction is:

allowing combination of insns 23 and 24
allowing combination of insns 24 and 29

This combines three instructions:

reg359 = zero_extract(reg358, offset=1, length=31)
reg356 = (u32)reg359;
reg8   = (u32)reg356;

Into one:

reg8 = zero_extract(reg358, offset=1, length=32)

The problem is in combiner that decided higher bit is zero anyway (false assumption) and applied zero_extract to full 32-bits. Why exactly it decided 32 upper bits are zero (they are not) is not clear to me :) It happens somewhere in https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/combine.c;h=31e6a4f68254fab551300252688a52d8c3dcaaa4;hb=HEAD#l2628 where some if() statements take about a page. Hopefully gcc RTL and ia64 maintainers will help and guide us here.

Parting words

  • Having good robust tests in packages is always great as it helps gaining confidence package is not completely broken on new target platforms (or toolchain versions).
  • printf() is still good enough tool to trace compiler bugs :)
  • -fdump-tree-all and -fdump-rtl-all are useful gcc flags to get the idea why optimisations fire (or don’t).
  • gcc performs very unobvious optimisations!

Have fun!

Posted on December 23, 2017
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

December 22, 2017
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
Talks page (December 22, 2017, 13:39 UTC)

I finally decided to put up a page on this website referencing the talks I’ve been giving over the last few years.

You’ll see that I’m quite obsessed with fault tolerance and scalability designs… I guess I can’t deny it any more 🙂

December 21, 2017
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
New, new gamestation (December 21, 2017, 00:04 UTC)

Full disclosure: the following posts has a significantly higher amount of Amazon Affiliates links than usual. That’s because I am talking about the hardware I just bought, and this post counts just as much as an update on my hardware as a recommendation on the stuff I bought, I have not gotten hacked or bought out by anyone.

As I noted in my previous quick update, my gamestation went missing in the move. I would even go as far as to say that it was stolen, but I have no way to prove whether it was stolen by the movers or during the move. This meant that I needed to get a new computer for my photo editing hobby, which meant more money spent, and still no news from the insurance. But oh well.

As for two years ago, I wanted to post here the current hardware setup I have. You’ll notice a number of similarities with the previous configuration, because I decided to stick as much as possible to what I had before, that worked.

  • CPU: Intel Core i7 7820X, which still has a nice 3.6GHz base clock, and has more cores than I had before.
  • Motherboard: MSI X299 SLI PLUS. You may remember that I had problems with the ASUS motherboard.
  • Memory: 8×Crucial 16GB DDR4.
  • Case: Fractal Design Define S, as I really like the designs of Fractal Design (pun not intended), and I do not need the full cage or the optical disk bays for sure this time around.
  • CPU cooler: NZXT Kraken X52, because the 280mm version appears to be more aimed towards extreme overclockers than my normal usage; this way I had more leeway on how to mount the radiator in.
  • SSD: 2×Crucial MX300 M.2 SATA. While I liked the Samsung 850 EVO, the performance of the MX300 appear to be effectively the same, and this allowed me to get the M2 version, leaving more space if I need to extend this further.
  • HDD: Toshiba X300 5TB because there is still need for spinning rust to archive data that is “at rest”.
  • GPU: Zotac GeForce GTX 1080Ti 11GB, because since I’m spending money I may just as well buy a top of the line card and be done with it.
  • PSU: Corsair RM850i, for the first time in years betraying beQuiet! as they didn’t have anything in stock at the time I ordered this.

This is the configuration in the chassis, but that ended up not being enough. In particular, because of my own stupidity, I ended up having to replace my beloved Dell U2711 monitor. I really like my UltraSharp, but earlier this year I ended up damaging the DisplayPort input on it — friends don’t let friends use DisplayPort with hooks on them! Get those without for extra safety, particularly if you have monitor arms or standing desks! Because of this I have been using a DVI-D DualLink cable instead. Unfortunately my new videocard (and most new videocard I could see) do not have DVI ports anymore, preferring instead multiple DisplayPort and (not even always) HDMI output. The UltraSharp, unfortunately, does not support 2560x1440 output over HDMI, and the DisplayPort-to-DVI adapter in the box is only for SingleLink DVI, which is not fast enough for that resolution either. DualLink DVI adapters exist, but for the most part they are “active” converters that require a power supply and more cables, and are not cheap (I have seen a “cheap” one for £150!)

I ended up buying a new monitor too, and I settled for the BenQ BL2711U, a 27 inches, 4k, 10-bit monitor “for designers” that boasts a 100% sRGB coverage. This is not my first BenQ monitor; a few months ago I bought a BenQ BL2420PT, a 24 inches monitor “for designers” that I use for both my XPS and for my work laptop, switching one and the other as needed over USB-C, and I have been pretty happy with it altogether.

Unfortunately the monitor came with DisplayPort cables with hooks, once again, so at first I decided to connect it over HDMI instead. And that was a big mistake, for multiple reasons. The first is that calibrating it with the ColorMunki was showing a huge gap between the colours uncalibrated and calibrated. The second was that, when I went to look into it, I could not enable 10-bit (10 bpc) mode in the NVIDIA display settings.

Repeat after me: if you want to use a BL-series BenQ monitor for photography you should connect it using DisplayPort.

The two problems were solved after switching to DisplayPort (temporarily with hooks, and ordered a proper cable already): 10bpc mode is not available over HDMI when using 4k resolution and 60Hz. HDMI 2 can do 4k and 10-bit (HDR) but only at lower framerate, which makes it fine for watching HDR movies and streaming, but not good for photo editing. The problem with the calibration was the same problem I noticed, but couldn’t be bothered figuring out how to fix, on my laptops: some of the gray highlighting of text would not actually be visible. For whatever reason, BenQ’s “designer” monitors ship with the HDMI colour range set to limited (16-235) rather than full (0-255). Why did they do that? I have no idea. Indeed switching the monitor to sRGB mode, full range, made the calibration effectively unnecessary (I still calibrated it out of nitpickery), and switching to DisplayPort removes the whole question on whether it should use limited or full range.

While the BenQ monitors have fairly decent integrated speakers, which make it unnecessary to have a soundbar for hearing system notifications or chatting with my mother, that is not the greatest option to play games on. So I ended up getting a pair of Bose Companion 2 speakers which are more than enough for what I need to use them for.

Now I have an overly powerful computer, and a very nice looking monitor. How do I connect them to the Internet? Well, here’s the problem: the Hyperoptic socket is in the living room, way too far from my computer to be useful. I could have just put a random WiFi adapter on it, but I also needed a new router anyway, since the box with my fairly new Linksys also got lost in the moving process.

So upon suggestion from a friend, and a recommendation from Troy Hunt I ended up getting a UAP-AC-PRO for the living room, and a UAP-AC-LITE for the home office, topped it with an EdgeRouter X (the recommendation of which was rescinded afterwards, but it seems to do its job for now), and set them as a bridge between the two locations. I think I should write down networking notes later, but Troy did that already so why bother?

So at the end of this whole thing I spent way more money on hardware than I planned to, I got myself a very new nice computer, and I have way too many extra cables than I need, plus the whole set of odds and ends of the old computer, router and scanner that are no longer useful (I still have the antennas for the router, and the power supply for the scanner). And I’m still short of the document scanner, which is a bit of a pain because I now have a collection of documents that need scanning. I could use the office’s scanners, but those don’t run OCR for the documents, and I have not seen anything decent to apply OCR to PDFs after the fact, I’m open to suggestions as I’m not sure I’m keen on ending up buying something like the EPSON DS-310 just for the duplex scanning and the OCR software.

December 18, 2017
Diego E. Pettenò a.k.a. flameeyes (homepage, bugs)
UK Banking, Attempt 2: Fineco Bank (December 18, 2017, 00:04 UTC)

So after a fairly negative experience with Barclays I have been quickly looking for alternatives. Two acquaintances who don’t know each other both suggested me to look into Fineco, which is an Italian bank also operating in the United Kingdom. As you can tell from their website, their focus is on trading and traders, but turns out they also make a fairly decent bank in and by themselves.

Indeed, opening the account with Fineco has been fairly straightforward: a few online forms, uploading documents to their identity verification system (very similar to what Revolut does, except using an Italian company that I already knew and was a customer of), and then sending £1 from a bank account that is already opened in your name. I found the forms also technically well-designed, particularly the fact that all the “I agree to” checkboxes automatically trigger JavaScript downloads of PDFs with the terms agreed, whether you clicked to read the agreement or not — I guess it’s a «No excuse, you have a copy of this» protection on their side, but it also made it very easy to archive all the needed information together with everything else I keep.

I should note here that it looks like Fineco’s target audience is Italian expats in the UK explicitly. It is common for most services to “special case” their local country as the first entry in the country drop-down, and then add the rest in alphabetical order. In the case of Fineco, the drop-down started with United Kingdom and Italy for all the options.

One of the good thing about this bank being focused so much on trading is that the account is by default a multicurrency one, similar to TransferWise Borderless Account. Indeed, in addition to the primary Sterling account, Fineco sets you up right away with accounts in Euro, Swiss Francs, and US Dollars, all connected to the same login. And in addition to this, they offer you the choice between a Sterling debit card, an Euro credit card, or both (for a reasonable fee of £10/yr). The two debit cards that are connected to the respective currency accounts (and no card is available for Francs or Dollars), and there are no foreign transaction fees for the two. While Revolut mostly took care of my foreign transaction fees, it’s always good to have a local debit card with a much higher availability, particularly as ATM access for Revolut has a relatively low monthly limit.

One of the interesting details of these currency accounts is that they all have Italian IBAN and BIC (with a separate SWIFT routing number, of its parent group UniCredit). For the main Sterling account, UK-style Sort Code and Account Number are available, which make it a proper local account.

This is actually very useful for me: for the past four years I have been keeping my old Italian account open, despite it costing me a fair bit of money just in service, because I have been paying the utilities for my mother’s house. And despite SEPA Direct Debit having been introduced over two years ago, the utilities I contacted failed to let me debit a foreign (Irish) account. Since I left Ireland, and the UK is not a Euro country, I was afraid I would have to keep my Italian account open even longer, but this actually solved the problem: for Italian utilities, the account is a perfectly valid Italian account, as for the most part they don’t even validate the billing address.

An aside: Vodafone Italy and Wind 3 Italy are still attached to my Tesco credit card, which Tesco Bank assures me I can keep using as long as I direct debit it into an Euro account anywhere. They even changed my mailing address to my new apartment in London. Those two companies insist that they only ever accept Italian credit cards, but they accepted my Irish credit card just fine before; in the case of Vodafone, they have an explicit whitelist of the BIN (for whatever reason), while Wind couldn’t get a hold of the concept that the card is Irish at all. Oh well.

Speaking of direct debits and odd combinations, while I should have now managed to switch all the utilities, including the council tax, to direct debit on this new account, I had some trouble doing the setup with Thameswater, the water provider in my area. If I tried setting up the direct debit online, it would report Fineco’s sort code (30-02-48) as invalid. The Sort Code Checker provided by the category association says it’s valid and it works for everything beside the cheque and credit clearing (which is unneeded). I ended up having to call them and ask them to override the warning, but they have not sent me confirmation that they managed. This appears to be a common “feature” of Thameswater — oh and by the way their paper form to request the direct debit was a 404 response on their website. Sigh.

The UI of the bank (and of their app) is much more information-dense than any other bank I’ve ever used. It’s not a surprise when you consider that they their target audience is investors and traders. It does work well for me, but I can see how this would not be the most pleasing interface for most home users. The only feature I have been unable to find yet in the interface is how to set up standing orders – I contacted them this weekend and will see what they say – so for the moment I just set up a few months worth of rent as scheduled payments, which work just as fine for the moment.

The Android app supports fingerprint authentication (unlike Barclay’s) and does not come with its own NFC payment system. Unfortunately the debit cards also appear not to be enabled for Android Pay, which is a bit of a shame. They also don’t leverage the app to send notifications, but they do send free SMS for new offline1 transactions happening on the debit card, which is great.

All in all, I may have found the bank I was looking for. It’s not a “cuddly” bank, but it appears to have what I need and it appears to work for my needs. With a bit of luck it will mean by Q1 I’ll be done with all the other bank accounts in both Ireland and Italy, and finally it’ll be simpler to keep an eye onto how much money I have and how much of it is spent around the place (although GnuCash does help a bit there). I’ll keep you all posted if this changes.


  1. Confusingly enough, a transaction happening over the Internet is an “offline” transaction. The online/offline is referred to the chip for chip’n’pin cards. If the chip is connected to a terminal that is in turn connected to the bank, that’s an online transaction. Otherwise it’s offline. If you read or type the number manually, it’s also offline. [return]

December 16, 2017
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
Stack protection on mips64 (December 16, 2017, 00:00 UTC)

trofi's blog: Stack protection on mips64

Stack protection on mips64

Bug

On #gentoo-toolchain Matt Turner reported an obscure SIGSEGV of top program on one of mips boxes:

07:01 <@mattst88> since upgrading to >=glibc-2.25 on mips:
07:01 <@mattst88> bcm91250a-be ~ # top
07:01 <@mattst88> *** stack smashing detected ***: <unknown> terminated
07:01 <@mattst88> Aborted

A simple (but widely used) tool seemingly managed to overwrite it’s stack. Looks straightforward, right? Let’s see!

Reproducing

As I don’t have any mips hardware I will fake some:

# crossdev -t mips64-unknown-linux-gnu
# <set a few profile variables>
# mips64-unknown-linux-gnu-emerge -1 procps

We’ve built a toolchain that targets mips64 (N32 ABI). Checking if it crashes:

$ LD_LIBRARY_PATH=/lib64 qemu-mipsn32 -L /usr/mips64-unknown-linux-gnu /usr/mips64-unknown-linux-gnu/usr/bin/top
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

This is not exactly stack smash. But who knows, maybe my set of tools is slightly different from Matt’s and the same bug manifests in a slightly different way.

First workaround

Gentoo recently enabled stack protection and already exposed a bug at least on powerpc32.

First, let’s check if it’s related to stack protector option:

# EXTRA_ECONF=--enable-stack-protector=no emerge -v1 cross-mips64-unknown-linux-gnu/glibc
$ LD_LIBRARY_PATH=/lib64 qemu-mipsn32 -L /usr/mips64-unknown-linux-gnu /usr/mips64-unknown-linux-gnu/usr/bin/top
<running top>

That worked! To unbreak mips users I disabled glibc stack self-protection by passing –enable-stack-protector=no.

Into the rabbit hole

Now let’s try to find out what is at fault here. In case of powerpc it was bad gcc code generation. Could it be something similar here?

First, we need a backtrace from qemu. Unfortunately qemu-generated .core files are truncated and are not directly readable by gdb:

$ gdb --quiet /usr/mips64-unknown-linux-gnu/usr/bin/top qemu_top_20171216-220221_14697.core
Reading symbols from /usr/mips64-unknown-linux-gnu/usr/bin/top...done.
BFD: Warning: /home/slyfox/qemu_top_20171216-220221_14697.core is truncated: expected core file size >= 8867840, found: 1504.

Let’s try gdbserver mode instead. Running qemu with -g 12345 to wait for gdb session:

$ LD_LIBRARY_PATH=/lib64 qemu-mipsn32 -g 12345 -L /usr/mips64-unknown-linux-gnu /usr/mips64-unknown-linux-gnu/usr/bin/top

And in second terminal fire up gdb:

$ gdb --quiet /usr/mips64-unknown-linux-gnu/usr/bin/top
Reading symbols from /usr/mips64-unknown-linux-gnu/usr/bin/top...done.
(gdb) set sysroot /usr/mips64-unknown-linux-gnu
(gdb) target remote :12345
Remote debugging using :12345
Reading symbols from /usr/mips64-unknown-linux-gnu/lib32/ld.so.1...
Reading symbols from /usr/lib64/debug//usr/mips64-unknown-linux-gnu/lib32/ld-2.26.so.debug...done.
done.
0x40801d10 in __start () from /usr/mips64-unknown-linux-gnu/lib32/ld.so.1
(gdb) continue 
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x408cb908 in _dlerror_run (operate=operate@entry=0x408cadf0 <dlopen_doit>, args=args@entry=0x407feb58)
    at dlerror.c:163
163       result->errcode = _dl_catch_error (&result->objname, &result->errstring,
(gdb) bt
#0  0x408cb908 in _dlerror_run (operate=operate@entry=0x408cadf0 <dlopen_doit>, args=args@entry=0x407feb58)
    at dlerror.c:163
#1  0x408caf4c in __dlopen (file=file@entry=0x10012d58 "libnuma.so", mode=mode@entry=1) at dlopen.c:87
#2  0x1000306c in before (me=0x407ff3b6 "/usr/mips64-unknown-linux-gnu/usr/bin/top") at top/top.c:3308
#3  0x10001a10 in main (dont_care_argc=<optimized out>, argv=0x407ff1d4) at top/top.c:5721

Woohoo! Nice backtrace! Matt confirmed he sees the same backtrace.

Curious fact: top tries to load libnuma.so opportunistically (top/top.c):

// ...
#ifndef NUMA_DISABLE
#if defined(PRETEND_NUMA) || defined(PRETEND8CPUS)
Numa_node_tot = Numa_max_node() + 1;
#else
// we'll try for the most recent version, then a version we know works...
if ((Libnuma_handle = dlopen("libnuma.so", RTLD_LAZY))
|| (Libnuma_handle = dlopen("libnuma.so.1", RTLD_LAZY))) {
Numa_max_node = dlsym(Libnuma_handle, "numa_max_node");
Numa_node_of_cpu = dlsym(Libnuma_handle, "numa_node_of_cpu");
if (Numa_max_node && Numa_node_of_cpu)
Numa_node_tot = Numa_max_node() + 1;
else {
dlclose(Libnuma_handle);
Libnuma_handle = NULL;
}
}
#endif
#endif
// ...

As I did not have libnuma installed it should not matter which library we try to load. I tried to write a minimal reproducer that only calls dlopen():

#include <dlfcn.h>
// mips64-unknown-linux-gnu-gcc dlopen-bug.c -o dlopen-bug -ldl
int main() {
void * h = dlopen("libdoes-not-exist.so", RTLD_LAZY);
}
$ mips64-unknown-linux-gnu-gcc dlopen-bug.c -o dlopen-bug -ldl
$ qemu-mipsn32 -L /usr/mips64-unknown-linux-gnu ./dlopen-bug
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

The backtrace is the same. OK, that’s bad. If it’s a stack overflow it definitely happens somewhere in glibc internals and not in the client.

Reproducing on master

Chances are we will need to fix glibc to get our stack protection back. I tried to reproduce the same failure on master branch:

$ ../glibc/configure \
      --enable-stack-protector=all \
      --enable-stackguard-randomization \
      --enable-kernel=3.2.0 \
      --enable-add-ons=libidn \
      --without-selinux \
      --without-cvs \
      --disable-werror \
      --enable-bind-now \
      --build=x86_64-pc-linux-gnu \
      --host=mips64-unknown-linux-gnu \
      --disable-profile \
      --without-gd \
      --with-headers=/usr/mips64-unknown-linux-gnu/usr/include \
      --prefix=/usr \
      --sysconfdir=/etc \
      --localstatedir=/var \
      --libdir='$(prefix)/lib32' \
      --mandir='$(prefix)/share/man' \
      --infodir='$(prefix)/share/info' \
      --libexecdir='$(libdir)/misc/glibc' \
      --disable-multi-arch \
      --disable-systemtap \
      --disable-nscd \
      --disable-timezone-tools \
      CFLAGS="-O2 -ggdb3"
$ make
$ qemu-mipsn32 ./elf/ld.so --library-path .:dlfcn ~/tmp/dlopen-bug

No failure. It could mean the bug was fixed in master or something else introduces the error.

I checked out glibc-2.26 and retried above:

$ qemu-mipsn32 ./elf/ld.so --library-path .:dlfcn ~/tmp/dlopen-bug
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

Aha, the problem is still there in latest release.

I bisected glibc from 2.26 to master to find the commit that fixes SIGSEGV. My bisection stopped at commit 2449ae7b

commit 2449ae7b2da24c9940962304a3e44bc80e389265
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Aug 10 13:40:22 2017 +0200

    ld.so: Introduce struct dl_exception

    This commit separates allocating and raising exceptions.  This
    simplifies catching and re-raising them because it is no longer
    necessary to make a temporary, on-stack copy of the exception message.

Looking hard at that commit I have found nothing that could fix a bug. The change shuffled a few things around but did not change behaviour too much.

I decided to fetch parent commit f87cc2bfb and spend some time to understand the failure mode.

First, the crash happens in _dlerror_run() code:

0x40801c70 in __start () from /home/slyfox/tmp/lib32/ld.so.1
(gdb) continue 
Continuing.

Program received signal SIGSEGV, Segmentation fault.
_dlerror_run (operate=operate@entry=0x40838df0 <dlopen_doit>, args=args@entry=0x407ff058) at dlerror.c:163
163       result->errcode = _dl_catch_error (&result->objname, &result->errstring,
(gdb) bt
#0  _dlerror_run (operate=operate@entry=0x40838df0 <dlopen_doit>, args=args@entry=0x407ff058) at dlerror.c:163
#1  0x40838f4c in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#2  0x1000076c in main ()
(gdb) list
158           if (result->malloced)
159             free ((char *) result->errstring);
160           result->errstring = NULL;
161         }
162
163       result->errcode = _dl_catch_error (&result->objname, &result->errstring,
164                                          &result->malloced, operate, args);
165
166       /* If no error we mark that no error string is available.  */
167       result->returned = result->errstring == NULL;

Simplified version of _dlerror_run() looks like this:

static struct dl_action_result last_result;
static struct dl_action_result *static_buf = &last_result
// ...
int
internal_function
_dlerror_run (void (*operate) (void *), void *args)
{
struct dl_action_result *result;
result = static_buf;
if (result->errstring != NULL)
{
if (result->malloced)
free ((char *) result->errstring);
result->errstring = NULL;
}
result->errcode = _dl_catch_error (&result->objname, &result->errstring,
&result->malloced, operate, args);
/* If no error we mark that no error string is available. */
result->returned = result->errstring == NULL;
return result->errstring != NULL;
}

The SIGSEGV happens when we try to store result of _dl_catch_error() into result->errcode.

I poked in gdb at the values of result right before _dl_catch_error() call and after it. And values are different! Time to look at where result is physically stored:

(gdb) disassemble /s _dlerror_run
163       result->errcode = _dl_catch_error (&result->objname, &result->errstring,
=> 0x40839918 <+184>:   sw      v0,0(s0)
(gdb) print (void*)$s0
$1 = (void *) 0x40834c44 <__stack_chk_guard>

The instruction stores single word(32 bits) at address of s0 register. But s0 points not to last_result (it did right before the call) but at __stack_chk_guard.

How stack checks work on mips

What is __stack_chk_guard? Has to do something about stack checks. Short answer: it’s a read-only variable that holds stack canary value. glibc is not supposed to write to it after it is initialized. Something leaked out address of that variable into s0 (callee-save register).

Let’s familiarize ourselves with mips ABI a bit and look at how stack checks look in generated code in a simple example:

void g(void) {}
; mips64-unknown-linux-gnu-gcc -S b.c -fno-stack-protector -O1
.file 1 "b.c"
.section .mdebug.abiN32
.previous
.nan legacy
.module fp=64
.module oddspreg
.abicalls
.text
.align 2
.globl g
.set nomips16
.set nomicromips
.ent g
.type g, @function
g:
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.set nomacro
jr $31
nop
.set macro
.set reorder
.end g
.size g, .-g
.ident "GCC: (Gentoo 7.2.0 p1.1) 7.2.0"

31 register is also known as ra, return address. The code has a lot of pragmas but they are needed only for debugging. The real code is two insructions: jr $31; nop.

Let’s check what fstack-protector-all does with our code:

; mips64-unknown-linux-gnu-gcc -S b.c -fstack-protector-all -O1
.file 1 "b.c"
.section .mdebug.abiN32
.previous
.nan legacy
.module fp=64
.module oddspreg
.abicalls
.text
.align 2
.globl g
.set nomips16
.set nomicromips
.ent g
.type g, @function
g:
.frame $sp,32,$31 # vars= 16, regs= 2/0, args= 0, gp= 0
.mask 0x90000000,-8
.fmask 0x00000000,0
.set noreorder
.set nomacro
addiu $sp,$sp,-32 ; allocate 32 bytes on stack
sd $31,24($sp) ; backup $31 (ra)
sd $28,16($sp) ; backup $28 (gp)
lui $28,%hi(__gnu_local_gp) ; compute address of GOT
addiu $28,$28,%lo(__gnu_local_gp) ; (requires two instructions
lw $2,%got_disp(__stack_chk_guard)($28) ; read offset of __stack_chk_guard in GOT
lw $3,0($2) ; read canary value of __stack_chk_guard
sw $3,12($sp) ; store canary on stack
; ... time to check our canary!
lw $3,12($sp) ; load canary from stack
lw $2,0($2) ; load canary from __stack_chk_guard
bne $3,$2,.L4 ; check canary value and crash the program
ld $31,24($sp) ; restore return address
ld $28,16($sp) ; restore gp
jr $31 ; (restore stack pointer and) return
addiu $sp,$sp,32
.L4:
lw $25,%call16(__stack_chk_fail)($28)
.reloc 1f,R_MIPS_JALR,__stack_chk_fail
1: jalr $25
nop
.set macro
.set reorder
.end g
.size g, .-g
.ident "GCC: (Gentoo 7.2.0 p1.1) 7.2.0"

Here is a quick table of mips registers.

15 instructions are doing the following: intermediate registers to hold canary value 2(v0) and 3(v1) are written on stack (sp register), read back and checked against value stored in __stack_chk_guard.

Quite straightforward.

Who broke s0?

Back to our _dl_catch_error() why did s0 change? mips ABI says s0 is callee-save. It means s0 should not be changed by callee functions.

To get more clues we need to dive into _dl_catch_error()

struct catch
{
const char **objname; /* Object/File name. */
const char **errstring; /* Error detail filled in here. */
bool *malloced; /* Nonzero if the string is malloced
by the libc malloc. */
volatile int *errcode; /* Return value of _dl_signal_error. */
jmp_buf env; /* longjmp here on error. */
};
// ...
int
internal_function
_dl_catch_error (const char **objname, const char **errstring,
bool *mallocedp, void (*operate) (void *), void *args)
{
/* We need not handle `receiver' since setting a `catch' is handled
before it. */
/* Only this needs to be marked volatile, because it is the only local
variable that gets changed between the setjmp invocation and the
longjmp call. All others are just set here (before setjmp) and read
in _dl_signal_error (before longjmp). */
volatile int errcode;
struct catch c;
/* Don't use an initializer since we don't need to clear C.env. */
c.objname = objname;
c.errstring = errstring;
c.malloced = mallocedp;
c.errcode = &errcode;
struct catch *const old = catch_hook;
catch_hook = &c;
/* Do not save the signal mask. */
if (__builtin_expect (__sigsetjmp (c.env, 0), 0) == 0)
{
(*operate) (args);
catch_hook = old;
*objname = NULL;
*errstring = NULL;
*mallocedp = false;
return 0;
}
/* We get here only if we longjmp'd out of OPERATE. _dl_signal_error has
already stored values into *OBJNAME, *ERRSTRING, and *MALLOCEDP. */
catch_hook = old;
return errcode;
}

This code is straightforward (but very scary): it wraps call of operate callback into __sigsetjmp() (really just a setjmp()).

setjmp() is a simple-ish function: it stores most of current registers into c.env() and later longjmp() restores them. Caller-saves are not saved because longjmp() looks like a normal C function call.

Normally longjmp() is called only when error condition happens. In our case it’s called when dlopen() fails (we are opening non-existent file). longjmp() restores all registers stored by setjmp() including instruction pointer pc, stack pointer sp, caller-saves s0..s7 and others.

The question arises: why and where do we lose s0 register? At save (setjmp()) or at restore (longjmp())?

As we can see setjmp() and longjmp() are functions very sensitive to ABI. Let’s check how __sigsetjmp() is implemented at sysdeps/mips/mips64/setjmp.S

ENTRY (__sigsetjmp)
SETUP_GP
SETUP_GP64_REG (v0, C_SYMBOL_NAME (__sigsetjmp))
move a2, sp
move a3, fp
PTR_LA t9, __sigsetjmp_aux
RESTORE_GP64_REG
move a4, gp
jr t9
END (__sigsetjmp)

Note how __sigsetjmp does almost nothing here: only saves gp, sp and fp and defers everything to __sigsetjmp_aux. Let’s peek at that in sysdeps/mips/mips64/setjmp_aux.c

Suddenly, its implementation is in C:

int
__sigsetjmp_aux (jmp_buf env, int savemask, long long sp, long long fp,
long long gp)
{
/* Store the floating point callee-saved registers... */
asm volatile ("s.d $f20, %0" : : "m" (env[0].__jmpbuf[0].__fpregs[0]));
asm volatile ("s.d $f22, %0" : : "m" (env[0].__jmpbuf[0].__fpregs[1]));
asm volatile ("s.d $f24, %0" : : "m" (env[0].__jmpbuf[0].__fpregs[2]));
asm volatile ("s.d $f26, %0" : : "m" (env[0].__jmpbuf[0].__fpregs[3]));
asm volatile ("s.d $f28, %0" : : "m" (env[0].__jmpbuf[0].__fpregs[4]));
asm volatile ("s.d $f30, %0" : : "m" (env[0].__jmpbuf[0].__fpregs[5]));
/* .. and the PC; */
asm volatile ("sd $31, %0" : : "m" (env[0].__jmpbuf[0].__pc));
/* .. and the stack pointer; */
env[0].__jmpbuf[0].__sp = sp;
/* .. and the FP; it'll be in s8. */
env[0].__jmpbuf[0].__fp = fp;
/* .. and the GP; */
env[0].__jmpbuf[0].__gp = gp;
/* .. and the callee-saved registers; */
asm volatile ("sd $16, %0" : : "m" (env[0].__jmpbuf[0].__regs[0]));
asm volatile ("sd $17, %0" : : "m" (env[0].__jmpbuf[0].__regs[1]));
asm volatile ("sd $18, %0" : : "m" (env[0].__jmpbuf[0].__regs[2]));
asm volatile ("sd $19, %0" : : "m" (env[0].__jmpbuf[0].__regs[3]));
asm volatile ("sd $20, %0" : : "m" (env[0].__jmpbuf[0].__regs[4]));
asm volatile ("sd $21, %0" : : "m" (env[0].__jmpbuf[0].__regs[5]));
asm volatile ("sd $22, %0" : : "m" (env[0].__jmpbuf[0].__regs[6]));
asm volatile ("sd $23, %0" : : "m" (env[0].__jmpbuf[0].__regs[7]));
/* Save the signal mask if requested. */
return __sigjmp_save (env, savemask);
}

The function duly stores every caller-save (including 16 aka s0) and more into c.env. But what happens when that function is being compiled with -fstack-protector-all? How does it preserve original registers?

Unfortunately the answer is: it does not.

Let’s compare assembly output with and without -fstack-protector-all:

; **-fno-stack-protector**:
Dump of assembler code for function __sigsetjmp_aux:
addiu sp,sp,-16
sd gp,0(sp)
lui gp,0x16
addu gp,gp,t9
sd ra,8(sp)
addiu gp,gp,7248
sdc1 $f20,104(a0)
sdc1 $f22,112(a0)
sdc1 $f24,120(a0)
sdc1 $f26,128(a0)
sdc1 $f28,136(a0)
sdc1 $f30,144(a0)
sd ra,0(a0)
sd a2,8(a0)
sd a3,80(a0)
sd a4,88(a0)
sd s0,16(a0)
sd s1,24(a0)
sd s2,32(a0)
sd s3,40(a0)
sd s4,48(a0)
sd s5,56(a0)
sd s6,64(a0)
sd s7,72(a0)
lw t9,-32236(gp)
bal 0x30000 <__sigjmp_save>
nop
ld ra,8(sp)
ld gp,0(sp)
jr ra
addiu sp,sp,16
; **-fstack-protector-all**:
Dump of assembler code for function __sigsetjmp_aux:
addiu sp,sp,-48
sd gp,32(sp)
lui gp,0x18
addu gp,gp,t9
addiu gp,gp,-23968
sd s0,24(sp) ; here we backup s0
lw s0,-27824(gp) ; and load into s0 stack canary address
sd ra,40(sp)
lw v1,0(s0)
sw v1,12(sp)
sdc1 $f20,104(a0)
sdc1 $f22,112(a0)
sdc1 $f24,120(a0)
sdc1 $f26,128(a0)
sdc1 $f28,136(a0)
sdc1 $f30,144(a0)
sd ra,0(a0)
sd a2,8(a0)
sd a3,80(a0)
sd a4,88(a0)
sd s0,16(a0)
sd s1,24(a0)
sd s2,32(a0)
sd s3,40(a0)
sd s4,48(a0)
sd s5,56(a0)
sd s6,64(a0)
sd s7,72(a0)
lw t9,-32100(gp)
bal 0x31940 <__sigjmp_save>
nop
lw a0,12(sp)
lw v1,0(s0)
bne a0,v1,0x31c7c <__sigsetjmp_aux+156>
ld ra,40(sp)
ld gp,32(sp)
ld s0,24(sp)
jr ra
addiu sp,sp,48
lw t9,-32644(gp)
jalr t9
nop

Or in diff form:

--- no-sp 2017-12-16 23:53:51.591627849 +0000
+++ spa 2017-12-16 23:53:37.952647838 +0000
@@ -1 +1 @@
- ; **-fno-stack-protector**:
+ ; **-fstack-protector-all**:
@@ -3,3 +3,3 @@
- addiu sp,sp,-16
- sd gp,0(sp)
- lui gp,0x16
+ addiu sp,sp,-48
+ sd gp,32(sp)
+ lui gp,0x18
@@ -7,2 +7,6 @@
- sd ra,8(sp)
- addiu gp,gp,7248
+ addiu gp,gp,-23968
+ sd s0,24(sp) ; here we backup s0
+ lw s0,-27824(gp) ; and load into s0 stack canary address
+ sd ra,40(sp)
+ lw v1,0(s0)
+ sw v1,12(sp)
@@ -27,2 +31,2 @@
- lw t9,-32236(gp)
- bal 0x30000 <__sigjmp_save>
+ lw t9,-32100(gp)
+ bal 0x31940 <__sigjmp_save>
@@ -30,2 +34,6 @@
- ld ra,8(sp)
- ld gp,0(sp)
+ lw a0,12(sp)
+ lw v1,0(s0)
+ bne a0,v1,0x31c7c <__sigsetjmp_aux+156>
+ ld ra,40(sp)
+ ld gp,32(sp)
+ ld s0,24(sp)
@@ -33 +41,4 @@
- addiu sp,sp,16
+ addiu sp,sp,48
+ lw t9,-32644(gp)
+ jalr t9
+ nop

The minimal reproducer and fix

To fully nail down the problem I’d like to have something nice for upstream. Here is the minimal reproducer:

#include <setjmp.h>
#include <stdio.h>
int main() {
jmp_buf jb;
volatile register long s0 asm ("$s0");
s0 = 1234;
if (setjmp(jb) == 0)
longjmp(jb, 1);
printf ("$s0 = %lu\n", s0);
}
$ qemu-mipsn32 -L ~/bad-libc ./mips-longjmp-bug
$s0 = 1082346564
$ qemu-mipsn32 -L ~/fixed-libc ./mips-longjmp-bug
$s0 = 1234

And the fix is to disable stack protection of __sigsetjmp_aux() in all build modes of glibc. This does work:

--- a/sysdeps/mips/mips64/setjmp_aux.c
+++ b/sysdeps/mips/mips64/setjmp_aux.c
@@ -25,6 +25,7 @@
access them in C. */
int
+inhibit_stack_protector
__sigsetjmp_aux (jmp_buf env, int savemask, long long sp, long long fp,
long long gp)
{

Parting words

So it was not a stack corruption after all. But the register corruption triggered by a security feature.

What more interesting is why Matt got stack smashing detected and not SIGSEGV. It means that write into __stack_chk_guard actually succeeded and caused canary check failure not because on-stack canary copy changed but because global canary changed. __stack_chk_guard sits in .data.rel.ro section. qemu maps it as read-only and crashes my process. How it behaves on real target is an exercise to the reader with mips device :)

Fun facts:

  • It took me 12 days to find out the cause of failure. Working gdb and qemu-user made the fix happen.
  • setjmp()/longjmp() are not so opaque for me and hopefully for you. But make sure you have read all the volatility gotchas (Notes section)
  • glibc uses setjmp()/longjmp() for error handling
  • mips ABI is very pleasant to work with and not as complicated as I thought :)
  • qemu-mipsn32 needs a fix to generate readable .core files

Have fun!

Posted on December 16, 2017
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

December 12, 2017
Alexys Jacob a.k.a. ultrabug (homepage, bugs)
py3status v3.7 (December 12, 2017, 14:34 UTC)

This important release has been long awaited as it focused on improving overall performance of py3status as well as dramatically decreasing its memory footprint!

I want once again to salute the impressive work of @lasers, our amazing contributors from the USA who has become top one contributor of 2017 in term of commits and PRs.

Thanks to him, this release brings a whole batch of improvements and QA clean ups on various modules. I encourage you to go through the changelog to see everything.

Highlights

Deep rework of the usage and scheduling of threads to run modules has been done by @tobes.

  •  now py3status does not keep one thread per module running permanently but instead uses a queue to spawn a thread to execute the module only when its cache expires
  • this new scheduling and usage of threads allows py3status to run under asynchronous event loops and gevent will be supported on the upcoming 3.8
  • memory footprint of py3status got largely reduced thanks to the threads modifications and thanks to a nice hunt on ever growing and useless variables
  • modules error reporting is now more detailed

Milestone 3.8

The next release will bring some awesome new features such as gevent support, environment variable support in config file and per module persistent data storage as well as new modules!

Thanks contributors!

This release is their work, thanks a lot guys!

  • JohnAZoidberg
  • lasers
  • maximbaz
  • pcewing
  • tobes

November 30, 2017
Nathan Zachary a.k.a. nathanzachary (homepage, bugs)

My love for Apple and especially MacOS does not run deep. Actually, it is essentially nonexistent. I was recently reminded of the myriad reasons why I don’t like MacOS, and one of them is that the OS should never stand in the way of what the operator wants to do. In this case, I found that even the root account couldn’t write to certain directories that MacOS deemed special. That “feature” is known System Integrity Protection. I’m not going to rant about how absurd it is to disallow the root account the ability to write, but instead I’d like to present the method of disabling System Integrity Protection.

First of all, one needs to get into the Recovery Mode of MacOS. Typically, this wouldn’t be all that difficult when following the instructions provided by Apple. Essentially, to get into Recovery Mode, one just has to hold Command+R when booting up the system. That’s all fine and dandy if it is a physical host and one has an Apple keyboard. However, my situation called for Recovery Mode from a virtual machine and using a non-Apple keyboard (so no Command key). Yes, yes, I know that MacOS offers the ability to set different key combinations, but then those would still have to be trapped by VMWare Fusion during boot. Instead, I figured that there had to be a way to do it from the MacOS terminal.

After digging through documentation and man pages (I’ll spare you the trials and tribulations of trying to find answers 😛 ), I finally found that, yes, one CAN reboot MacOS into Recovery Mode without the command key. To do so, open up the Terminal and type the following commands:

nvram "recovery-boot-mode=unused"
reboot recovery

The Apple host will reboot and the Recover Mode screen will be presented:

MacOS Recovery Mode - Utilities - Terminal
Click to enlarge

Now, in the main window, there are plenty of tasks that can be launched. However, I needed a terminal, and it might not be readily apparent, but to get it, you click on the “Utilities” menu in the top menu bar (see the screenshot above), and then select “Terminal”. Thereafter, it is fairly simple to disable System Integrity Protection via the following command:

csrutil disable

All that’s left is to reboot by going to the Apple Menu and clicking on “Restart”.

Though the procedures of getting to the MacOS Recovery menu without using the Command key and disabling System Integrity Protection are not all that difficult, they were a pain to figure out. Furthermore, I’m not sure why SIP disallows root’s write permissions anyway. That seems absurd, especially in light of Apple’s most recent glaring security hole of allowing root access without a password. 😳

Cheers,
Zach

November 25, 2017
Sebastian Pipping a.k.a. sping (homepage, bugs)

November 25: International Day for the Elimination of Violence against Women
(Wikipedia)

November 24, 2017
Sebastian Pipping a.k.a. sping (homepage, bugs)

Not new at all but was new to me, and was well worth my time:

DEFCON 19: Bit-squatting: DNS Hijacking Without Exploitation (w speaker)

Only somewhat related: https://www.pytosquatting.org/

November 20, 2017
Sven Vermeulen a.k.a. swift (homepage, bugs)
SELinux and extended permissions (November 20, 2017, 16:00 UTC)

One of the features present in the August release of the SELinux user space is its support for ioctl xperm rules in modular policies. In the past, this was only possible in monolithic ones (and CIL). Through this, allow rules can be extended to not only cover source (domain) and target (resource) identifiers, but also a specific number on which it applies. And ioctl's are the first (and currently only) permission on which this is implemented.

Note that ioctl-level permission controls isn't a new feature by itself, but the fact that it can be used in modular policies is.

November 18, 2017
Sergei Trofimovich a.k.a. slyfox (homepage, bugs)
Endianness of a single byte: big or little? (November 18, 2017, 00:00 UTC)

trofi's blog: Endianness of a single byte: big or little?

Endianness of a single byte: big or little?

Bug

Rolf Eike Beer noticed two test failures while testing radvd package (IPv6 route advertiser daemon and more) on sparc. Both tests failed likely due to endianness issue:

test/send.c:317:F:build:test_add_ra_option_lowpanco:0:
  Assertion '0 == memcmp(expected, sb.buffer, sizeof(expected))'
    failed: 0 == 0, memcmp(expected, sb.buffer, sizeof(expected)) == 1
test/send.c:342:F:build:test_add_ra_option_abro:0:
  Assertion '0 == memcmp(expected, sb.buffer, sizeof(expected))'
    failed: 0 == 0, memcmp(expected, sb.buffer, sizeof(expected)) == 1

I’ve confirmed the same failure on powerpc.

Eike noted that it’s unusual because sparc is a big-endian architecture and network byteorder is also big-endian (thus no need to flip bytes). Something very specific must have lurked in radvd code to break endianness in this case. Curiously all the radvd tests were working fine on amd64.

Two functions failed to produce expected results:

START_TEST(test_add_ra_option_lowpanco)
{
ck_assert_ptr_ne(0, iface);
struct safe_buffer sb = SAFE_BUFFER_INIT;
add_ra_option_lowpanco(&sb, iface->AdvLowpanCoList);
unsigned char expected[] = {
0x22, 0x03, 0x32, 0x48, 0x00, 0x00, 0xe8, 0x03, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};
ck_assert_int_eq(sb.used, sizeof(expected));
ck_assert_int_eq(0, memcmp(expected, sb.buffer, sizeof(expected)));
safe_buffer_free(&sb);
}
END_TEST
START_TEST(test_add_ra_option_abro)
{
ck_assert_ptr_ne(0, iface);
struct safe_buffer sb = SAFE_BUFFER_INIT;
add_ra_option_abro(&sb, iface->AdvAbroList);
unsigned char expected[] = {
0x23, 0x03, 0x0a, 0x00, 0x02, 0x00, 0x02, 0x00, 0xfe, 0x80, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0xa2, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01,
};
ck_assert_int_eq(sb.used, sizeof(expected));
ck_assert_int_eq(0, memcmp(expected, sb.buffer, sizeof(expected)));
safe_buffer_free(&sb);
}
END_TEST

Both tests are straightforward: they verify 6LoWPAN and ABRO extension handling (both are related to route announcement for Low Power devices). Does not look complicated.

I looked at add_ra_option_lowpanco() implementation and noticed at least one bug of missing endianness conversion:

// radvd.h
struct nd_opt_abro {
uint8_t nd_opt_abro_type;
uint8_t nd_opt_abro_len;
uint16_t nd_opt_abro_ver_low;
uint16_t nd_opt_abro_ver_high;
uint16_t nd_opt_abro_valid_lifetime;
struct in6_addr nd_opt_abro_6lbr_address;
};
// ...
// send.c
static void add_ra_option_mipv6_home_agent_info(struct safe_buffer *sb, struct mipv6 const *mipv6)
{
struct HomeAgentInfo ha_info;
memset(&ha_info, 0, sizeof(ha_info));
ha_info.type = ND_OPT_HOME_AGENT_INFO;
ha_info.length = 1;
ha_info.flags_reserved = (mipv6->AdvMobRtrSupportFlag) ? ND_OPT_HAI_FLAG_SUPPORT_MR : 0;
ha_info.preference = htons(mipv6->HomeAgentPreference);
ha_info.lifetime = htons(mipv6->HomeAgentLifetime);
safe_buffer_append(sb, &ha_info, sizeof(ha_info));
}
static void add_ra_option_abro(struct safe_buffer *sb, struct AdvAbro const *abroo)
{
struct nd_opt_abro abro;
memset(&abro, 0, sizeof(abro));
abro.nd_opt_abro_type = ND_OPT_ABRO;
abro.nd_opt_abro_len = 3;
abro.nd_opt_abro_ver_low = abroo->Version[1];
abro.nd_opt_abro_ver_high = abroo->Version[0];
abro.nd_opt_abro_valid_lifetime = abroo->ValidLifeTime;
abro.nd_opt_abro_6lbr_address = abroo->LBRaddress;
safe_buffer_append(sb, &abro, sizeof(abro));
}

Note how add_ra_option_mipv6_home_agent_info() carefully flips bytes with htons() for all uint16_t fields but add_ra_option_abro() does not.

It means the ABRO does not really work on little-endian (aka most) systems in radvd and test checks for the wrong thing. I added missing htons() calls and fixed expected[] output in tests by manually flipping two bytes in a few locations.

Plot twist

The effect was slightly unexpected: I fixed only ABRO test, but not 6LoWPAN. It’s where things became interesting. Let’s look at add_ra_option_lowpanco() implementation:

// radvd.h
struct nd_opt_6co {
uint8_t nd_opt_6co_type;
uint8_t nd_opt_6co_len;
uint8_t nd_opt_6co_context_len;
uint8_t nd_opt_6co_res : 3;
uint8_t nd_opt_6co_c : 1;
uint8_t nd_opt_6co_cid : 4;
uint16_t nd_opt_6co_reserved;
uint16_t nd_opt_6co_valid_lifetime;
struct in6_addr nd_opt_6co_con_prefix;
};
// ...
// send.c
static void add_ra_option_lowpanco(struct safe_buffer *sb, struct AdvLowpanCo const *lowpanco)
{
struct nd_opt_6co co;
memset(&co, 0, sizeof(co));
co.nd_opt_6co_type = ND_OPT_6CO;
co.nd_opt_6co_len = 3;
co.nd_opt_6co_context_len = lowpanco->ContextLength;
co.nd_opt_6co_c = lowpanco->ContextCompressionFlag;
co.nd_opt_6co_cid = lowpanco->AdvContextID;
co.nd_opt_6co_valid_lifetime = lowpanco->AdvLifeTime;
co.nd_opt_6co_con_prefix = lowpanco->AdvContextPrefix;
safe_buffer_append(sb, &co, sizeof(co));
}

The test still failed to match one single byte: the one that spans 3 bitfields: nd_opt_6co_res, nd_opt_6co_c, nd_opt_6co_cid. But why? Does endianness really matter within byte? Apparently gcc happens to group those 3 fields in different orders on x86_64 and powerpc!

Let’s looks at a smaller example:

#include <stdio.h>
#include <stdint.h>
struct s {
uint8_t a : 3;
uint8_t b : 1;
uint8_t c : 4;
};
int main() {
struct s v = { 0x00, 0x1, 0xF, };
printf("v = %#02x\n", *(uint8_t*)&v);
return 0;
}

Output difference:

$ x86_64-pc-linux-gnu-gcc a.c -o a && ./a
v = 0xf8
# (0xF << 5) | (0x1 << 4) | 0x00
$ powerpc-unknown-linux-gnu-gcc a.c -o a && ./a
v = 0x1f
# (0x0 << 5) | (0x1 << 4) | 0xF

C standard does not specify layout of bitfields and it’s a great illustration of how things break :)

An interesting observation: the bitfield order on powerpc happens to be the desired order (as 6LoWPAN RFC defines it).

It means radvd code indeed happened to generate correct bitstream on big-endian platforms (as Eike predicted) but did not work on little-endian systems. Unfortunately golden expected[] output was generated on little-endian system.

Thus the 3 fixes:

That’s it :)

Posted on November 18, 2017
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> comments powered by Disqus

November 17, 2017
Nathan Zachary a.k.a. nathanzachary (homepage, bugs)
Python’s M2Crypto fails to compile (November 17, 2017, 21:29 UTC)

When updating my music server, I ran into a compilation error on Python’s M2Crypto. The error message was a little bit strange and not directly related to the package itself:

fatal error: openssl/ecdsa.h: No such file or directory

Obviously, that error is generated from OpenSSL and not directly within M2Crypto. Remembering that there are some known problems with the “bindist” USE flag, I took a look at OpenSSL and OpenSSH. Indeed, “bindist” was set. Simply removing the USE flag from those two packages took care of the problem:


# grep -i 'openssh\|openssl' /etc/portage/package.use
>=dev-libs/openssl-1.0.2m -bindist
net-misc/openssh -bindist

In this case, the problems makes sense based on the error message. The error indicated that the Elliptic Curve Digital Signature Algorithm (ECDSA) header was not found. In the previously-linked page about the “bindist” USE flag, it clearly states that having bindist set will “Disable/Restrict EC algorithms (as they seem to be patented)”.

Cheers,
Nathan Zachary

November 16, 2017
Hanno Böck a.k.a. hanno (homepage, bugs)
Some minor Security Quirks in Firefox (November 16, 2017, 16:24 UTC)

FirefoxI discovered a couple of more or less minor security issues in Firefox lately. None of them is particularly scary, but they affect interesting corner cases or unusual behavior. I'm posting this mainly hoping that other people will find it inspiring to think about unusual security issues and maybe also come up with more realistic attack scenarios for these bugs.

I'd like to point out that Mozilla hasn't fixed most of those issues, despite all of them being reported several months ago.

Bypassing XSA warning via FTP

XSA or Cross-Site Authentication is an interesting and not very well known attack. It's been discovered by Joachim Breitner in 2005.

Some web pages, mostly forums, allow users to include third party images. This can be abused by an attacker to steal other user's credentials. An attacker first posts something with an image from a server he controls. He then switches on HTTP authentication for that image. All visitors of the page will now see a login dialog on that page. They may be tempted to type in their login credentials into the HTTP authentication dialog, which seems to come from the page they trust.

The original XSA attack is, as said, quite old. As a countermeasure Firefox implements a warning in HTTP authentication dialogs that were created by a subresource like an image. However it only does that for HTTP, not for FTP.

So an attacker can run an FTP server and include an image from there. By then requiring an FTP login and logging all login attempts to the server he can gather credentials. The password dialog will show the host name of the attacker's FTP server, but he could choose one that looks close enough to the targeted web page to not raise suspicion.

Firefox FTP password dialog

I haven't found any popular site that allows embedding images from non-HTTP-protocols. The most popular page that allows embedding external images at all is Stack Overflow, but it only allows HTTPS. Generally embedding third party images is less common these days, most pages keep local copies if they embed external images.

This bug is yet unfixed.

Obviously one could fix it by showing the same warning for FTP that is shown for HTTP authentication. But I'd rather recommend to completely block authentication dialogs on third party content. This is also what Chrome is doing. Mozilla has been discussing this for several years with no result.

Firefox also has an open bug about disallowing FTP on subresources. This would obviously also fix this scenario.

Window-modal popup via FTP

In the early days of JavaScript web pages could annoy users with popups. Browsers have since changed the behavior of JavaScript popups. They are now tab-modal, which means they're not blocking the interaction with the whole browser, they're just part of one tab and will only block the interaction with the web page that created them.

So it is a goal of modern browsers to not allow web pages to create window-modal alerts that block the interaction with the whole browser. However I figured out FTP gives us a bypass of this restriction.

If Firefox receives some random garbage over an FTP connection that it cannot interpret as FTP commands it will open an alert window showing that garbage.

Window modal FTP alert

First we open up our fake "FTP-Server" that will simply send a message to all clients. We can just use netcat for this:

while true; do echo "Hello" | nc -l -p 21; done

Then we try to open a connection, e. g. by typing ftp://localhost in the address bar on the same system. Firefox will not show the alert immediately. However if we then click on the URL bar and press enter again it will show the alert window. I tried to replicate that behavior with JavaScript, which worked sometimes. I'm relatively sure this can be made reliable.

There are two problems here. One is that server controlled content is showed to the user without any interpretation. This alert window seems to be intended as some kind of error message. However it doesn't make a lot of sense like that. If at all it should probably be prefixed by some message like "the server sent an invalid command". But ultimately if the browser receives random garbage instead of protocol messages it's probably not wise to display that at all. The second problem is that FTP error messages probably should be tab-modal as well.

This bug is also yet unfixed.

FTP considered dangerous

FTP is an old protocol with many problems. Some consider the fact that browsers still support it a problem. I tend to agree, ideally FTP should simply be removed from modern browsers.

FTP in browsers is insecure by design. While TLS-enabled FTP exists browsers have never supported it. The FTP code is probably not well audited, as it's rarely used. And the fact that another protocol exists that can be used similarly to HTTP has the potential of surprises. For example I found it quite surprising to learn that it's possible to have unencrypted and unauthenticated FTP connections to hosts that enabled HSTS. (The lack of cookie support on FTP seems to avoid causing security issues, but it's still unexpected and feels dangerous.)

Self-XSS in bookmark manager export

The Firefox Bookmark manager allows exporting bookmarks to an HTML document. Before the current Firefox 57 it was possible to inject JavaScript into this exported HTML via the tags field.

I tried to come up with a plausible scenario where this could matter, however this turned out to be difficult. This would be a problematic behavior if there's a way for a web page to create such a bookmark. While it is possible to create a bookmark dialog with JavaScript, this doesn't allow us to prefill the tags field. Thus there is no way a web page can insert any content here.

One could come up with implausible social engineering scenarios (web page asks user to create a bookmark and insert some specific string into the tags field), but that seems very far fetched. A remotely plausible scenario would be a situation where a browser can be used by multiple people who are allowed to create bookmarks and the bookmarks are regularly exported and uploaded to a web page. However that also seems quite far fetched.

This was fixed in the latest Firefox release as CVE-2017-7840 and considered as low severity.

Crashing Firefox on Linux via notification API

The notification API allows browsers to send notification alerts that the operating system will show in small notification windows. A notification can contain a small message and an icon.

When playing this one of the very first things that came to my mind was to check what happens if one simply sends a very large icon. A user has to approve that a web page is allowed to use the notification API, however if he does the result is an immediate crash of the browser. This only "works" on Linux. The proof of concept is quite simple, we just embed a large black PNG via a data URI:

<script>Notification.requestPermission(function(status){
new Notification("",{icon: "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAE4gAABOIAQAAAAB147pmAAAL70lEQVR4Ae3BAQ0AAADCIPuntscHD" + "A".repeat(4043) + "yDjFUQABEK0vGQAAAABJRU5ErkJggg==",});
});</script>


I haven't fully tracked down what's causing this, but it seems that Firefox tries to send a message to the system's notification daemon with libnotify and if that's too large for the message size limit of dbus it will not properly handle the resulting error.

What I found quite frustrating is that when I reported it I learned that this was a duplicate of a bug that has already been reported more than a year ago. I feel having such a simple browser crash bug open for such a long time is not appropriate. It is still unfixed.

November 10, 2017
Alice Ferrazzi a.k.a. alicef (homepage, bugs)
Latex 001 (November 10, 2017, 06:03 UTC)

This is manly a memo for remember each time which packages are needed for Japanese.

Usually I use xetex with cjk for write latex document on Gentoo this is done by adding xetex and cjk support to texlive.

app-text/texlive cjk xetex app-text/texlive-core cjk xetex

for platex is needed to install

dev-texlive/texlive-langjapanese

Google Summer of Code day41-51 (November 10, 2017, 05:59 UTC)

Google Summer of Code day 41

What was my plan for today?

  • testing and improving elivepatch

What i did today?

elivepatch work: - working on making the code for sending a unknown number of files with Werkzeug

Making and Testing patch manager

what i will do next time?

  • testing and improving elivepatch

Google Summer of Code day 42

What was my plan for today?

  • testing and improving elivepatch

What i did today?

elivepatch work: * Fixing sending multiple file using requests I'm making the script for incremental add patches but I'm stuck on this requests problem.

looks like I cannot concatenate files like this files = {'patch': ('01.patch', patch_01, 'multipart/form-data', {'Expires': '0'}), ('02.patch', patch_01, 'multipart/form-data', {'Expires': '0'}), ('03.patch', patch_01, 'multipart/form-data', {'Expires': '0'}), 'config': ('config', open(temporary_config.name, 'rb'), 'multipart/form-data', {'Expires': '0'})} or like this: files = {'patch': ('01.patch', patch_01, 'multipart/form-data', {'Expires': '0'}), 'patch': ('02.patch', patch_01, 'multipart/form-data', {'Expires': '0'}), 'patch': ('03.patch', patch_01, 'multipart/form-data', {'Expires': '0'}), 'config': ('config', open(temporary_config.name, 'rb'), 'multipart/form-data', {'Expires': '0'})} getting AttributeError: 'tuple' object has no attribute 'read'

looks like requests cannot manage to send data with same key but the server part of flask-restful is using parser.add_argument('patch', action='append') for concatenate more arguments togheter. and it suggest to send the informations like this: curl http://api.example.com -d "patch=bob" -d "patch=sue" -d "patch=joe"

Unfortunatly as now I'm still trying to understand how I can do it with requests.

what i will do next time?

  • testing and improving elivepatch

Google Summer of Code day 43

What was my plan for today?

  • making the first draft of incremental patches feature

What i did today?

elivepatch work: * Changed send_file for manage more patches at a time * Refactored functions name for reflecting the incremental patches change * Made function for list patches in the eapply_user portage user patches folder directory and temporary folder patches reflecting the applied livepatches.

what i will do next time?

  • testing and improving elivepatch

Google Summer of Code day 44

What was my plan for today?

  • making the first draft of incremental patches feature

What i did today?

elivepatch work: * renamed command client function as internal function * put togheter the client incremental patch sender function

what i will do next time?

  • make the server side incremtal patch part and test it

Google Summer of Code day 45

What was my plan for today?

  • testing and improving elivepatch

What i did today?

Meeting with mentor summary

elivepatch work: - working on incremental patch features design and implementatio - putting patch files under /var/run/elivepatch - ordering patch by numbers - cleaning folder when the machine is restarted - sending the patches to the server in order - cleaning client terminal output by catching the exceptions

what i will do next time?

  • testing and improving elivepatch

Google Summer of Code day 46

What was my plan for today?

  • making the first draft of incremental patches feature

What i did today?

elivepatch work: * testing elivepatch with incremental patches * code refactoring

what i will do next time?

  • make the server side incremtal patch part and test it

Google Summer of Code day 47

What was my plan for today?

  • making the first draft of incremental patches feature

What i did today?

elivepatch work: * testing elivepatch with incremental patches * trying to make some way for make testing more fast, I'm thinking something like unit test and integration testing. * merged build_livepatch with send_files as it can reduce the steps for making the livepatch and help making it more isolated. * going on writing the incremental patch

Building with kpatch-build takes usually too much time, this is making testing parts where building live patch is needed, long and complicated. Would probably be better to add some unit/integration testing for keep each task/function under control without needing to build live patches every time. any thoughts?

what i will do next time?

  • make the server side incremental patch part and test it

Google Summer of Code day 48

What was my plan for today?

  • making the first draft of incremental patches feature

What i did today?

elivepatch work: * added PORTAGE_CONFIGROOT for set the path from there get the incremental patches. * cannot download kernel sources to /usr/portage/distfiles/ * saving created livepatch and patch to the client but sending only the incremental patches and patch to the elivepatch server * Cleaned output from bash command

what i will do next time?

  • make the server side incremental patch part and test it

Google Summer of Code day 49

What was my plan for today?

  • testing and improving elivepatch incremental patches feature

What i did today?

1) we need a way for having old genpatches. mpagano made a script for saving all the genpatches and they are saved here: http://dev.gentoo.org/~mpagano/genpatches/tarballs/ I think we can redirect the ebuild on our overlay for get the tarballs from there.

2) kpatch can work with initrd

what i will do next time?

  • testing and improving elivepatch

Google Summer of Code day 50

What was my plan for today?

  • testing and improving elivepatch incremental patches feature

What i did today?

  • refactoring code
  • starting writing first draft for the automatical kernel livepatching system

what i will do next time?

  • testing and improving elivepatch

Google Summer of Code day 51

What was my plan for today?

  • testing and improving elivepatch incremental patches feature

What i did today?

  • checking eapply_user eapply_user is using patch

  • writing elivepatch wiki page https://wiki.gentoo.org/wiki/User:Aliceinwire/elivepatch

what i will do next time?

  • testing and improving elivepatch

Google Summer of Code day31-40 (November 10, 2017, 05:57 UTC)

Google Summer of Code day 31

What was my plan for today?

  • testing and improving elivepatch

What I did today?

Dispatcher.py:

  • fixed comments
  • added static method for return the kernel path
  • Added todo
  • fixed how to make directory with uuid

livepatch.py * fixed docstring * removed sudo on kpatch-build * fixed comments * merged ebuild commands in one on build livepatch

restful.py * fixed comment

what I will do next time?

  • testing and improving elivepatch

Google Summer of Code day 32

What was my plan for today?

  • testing and improving elivepatch

What I did today?

client checkers: * used os.path.join for ungz function * implemented temporary folder using python tempfile

what I will do next time?

  • testing and improving elivepatch

Google Summer of Code day 33

What was my plan for today?

  • testing and improving elivepatch

What I did today?

  • Working with tempfile for keeping the uncompressed configuration file using the appropriate tempfile module.
  • Refactoring and code cleaning.
  • check if the ebuild is present in the overlay before trying to merge it.

what I will do next time?

  • testing and improving elivepatch

Google Summer of Code day 34

What was my plan for today?

  • testing and improving elivepatch

What I did today?

  • lpatch added locally
  • fix ebuild directory
  • Static patch and config filename on send This is useful for isolating the API requests using only the uuid as session identifier
  • Removed livepatchStatus and lpatch class configurations. Because we need a request to only be identified by is own UUID, for isolating the transaction.
  • return in case the request with the same uuid is already present.

we still have that problem about "can'\''t find special struct alt_instr size."

I will investigate it tomorrow

what I will do next time?

  • testing and improving elivepatch
  • Investigating the missing information in the livepatch

Google Summer of Code day 35

What was my plan for today?

  • testing and improving elivepatch

What I did today?

Kpatch needs some special section data for finding where to inject the livepatch. This special section data existence is checked by kpatch-build in the given vmlinux file. The vmlinux file need CONFIG_DEBUG_INFO=y for making the debug symbols containing the special section data. This special section data is found like this:

# Set state if name matches
a == 0 && /DW_AT_name.* alt_instr[[:space:]]*$/ {a = 1; next}
b == 0 && /DW_AT_name.* bug_entry[[:space:]]*$/ {b = 1; next}
p == 0 && /DW_AT_name.* paravirt_patch_site[[:space:]]*$/ {p = 1; next}
e == 0 && /DW_AT_name.* exception_table_entry[[:space:]]*$/ {e = 1; next}

what I will do next time?

  • testing and improving elivepatch
  • Investigating the missing information in the livepatch

Google Summer of Code day 36

What was my plan for today?

  • testing and improving elivepatch

What I did today?

  • Fixed livepatch output name for reflecting the static set patch name
  • module install removed as not needed
  • debug option is now also copying the build.log to the uuid directory, for investigating failed attempt
  • refactored uuid_dir with uuid in the case where uuid_dir is not actually containing the full path
  • adding debug_info to the configuration file if not already present however, this setting is only needed for the livepatch creation (not tested yet)

what I will do next time?

  • testing and improving elivepatch
  • Investigating the missing information in the livepatch

Google Summer of Code day 37

What was my plan for today?

  • testing and improving elivepatch

What I did today?

  • Fix return message for cve option
  • Added different configuration example [4.9.30,4.10.17]
  • Catch Gentoo-sources not available error
  • Catch missing livepatch errors on the client

Tested kpatch with patch for kernel 4.9.30 and 4.10.17 and it worked without any problem. I checked with coverage for see which code is not used. I think we can maybe remove cve option for now as not implemented yet and we could use a modular implementation of for it, so the configuration could change. We need some documentation about elivepatch on the Gentoo wiki. We need some unit tests for making development more smooth and making it more simple to check the working status with GitHub Travis.

I talked with kpatch creator and we got some feedback:

“this project could also be used for kpatch testing :)
imagine instead of just loading the .ko, the client was to kick off a series of tests and report back.”

“why bother a production or tiny machine when you might have a patch-building server”

what I will do next time?

  • testing and improving elivepatch
  • Investigating the missing information in the livepatch

Google Summer of Code day 38

What was my plan for today?

  • testing and improving elivepatch

What I did today?

Meeting with mentor summary

What we will do next: - incremental patch tracking on the client side - CVE security vulnerability checker - dividing the repository - ebuild - documentation [optional] - modularity [optional]

Kpatch work: - Started to make the incremental patch feature - Tested kpatch for permission issue

what I will do next time?

  • testing and improving elivepatch
  • Investigating the missing information in the livepatch

Google Summer of Code day 39

What was my plan for today?

  • testing and improving elivepatch

What I did today?

Meeting with mentor summary

elivepatch work: - working on incremental patch features design and implementation - putting patch files under /var/run/elivepatch - ordering patch by numbers - cleaning folder when the machine is restarted - sending the patches to the server in order - cleaning client terminal output by catching the exceptions

what I will do next time?

  • testing and improving elivepatch

Google Summer of Code day 40

What was my plan for today?

  • testing and improving elivepatch

What I did today?

Meeting with mentor

summary of elivepatch work: - working with incremental patch manager - cleaning client terminal output by catching the exceptions

Making and Testing patch manager

what I will do next time?

  • testing and improving elivepatch

November 09, 2017
Cigars and the Norwegian Government (November 09, 2017, 21:36 UTC)

[Updated 22nd November to add link to the response to proposed new regulations] As some of my readers knows, I'm an aficionado of Cigars, to the extent I bought my own Cigar Importer and Store. The picture below is from the 20th best bar in the world in 2017, and they sell our cigars of … Continue reading "Cigars and the Norwegian Government"