November 18 2022

.tar sorting vs .xz compression ratio

Michał Górny (mgorny) November 18, 2022, 7:09

It is a pretty common knowledge that ordering of members within archive can affect the compression ratio. I’ve done some quick testing and the results somewhat surprised me. Firstly, it turned out that the simplest lexical sorting by name (path) gave the best result. Secondly, because it turned out that the difference between that and sorting by size was as large as 8%.

Note that this is a pretty specific source archive, so results may vary. Test details and commands in the remainder of the post.

Compression results per sort order Sort order Size in bytes Compared to best name 108 011 756 100.00% suffix 108 573 612 100.52% size (smallest first) 116 797 440 108.13% size (largest first) 116 645 940 108.00% suffix + size 111 709 128 103.42%

The conclusion? Sorting can affect compression ratio more than I have anticipated. However, all the “obvious” optimizations have made the result worse than plain lexical sorting. Perhaps it’s just the matter of well-organized source code keeping similar files in the same directories. Perhaps there is a way to optimize it even more (and beat sorting by name). One interesting option would be to group files by bucket sizes, and then sort by name.

Special thanks to Adrien Nader and Lasse Collin from #tukaani for inspiring me to do this.

Testing details

Testing was done on data from llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef.tar.gz snapshot. The archive was unpacked, then repacked using sorted file lists. Directory entries were not included in the archive.

Sorting by name was done using plain lexical sort, using the following command:

find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d | sort  | tar -cvvf /var/tmp/llvm.byname.tar -T -

Sorting by size involved numeric sort by file size, then by name:

find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d -printf '%s %p\n' | sort -n | cut -d' ' -f2- | tar -cf /var/tmp/llvm.bysize.tar -T -
find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d -printf '%s %p\n' | sort -k 1nr,2 | cut -d' ' -f2- | tar -cvvf /var/tmp/llvm.bysize.rev.tar -T -

Sorting by suffix involved extracting the final file suffix (if any) and sorting by it, then by full name:

find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d | sed -r -e 's:^:_ :' -e 's:.*\.([^/]*):\1&:' | sort | cut -d' ' -f2- | tar -cf /var/tmp/llvm.bysuffix.tar -T -

Sorting by suffix + size involved sorting by suffix first (so that similar files are grouped together), then by size, then by full path. Took some find(1) hackery:

find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d -printf '%s %p\n' | sed -r -e 's:^:_ :' -e 's:.*\.([^/]*):\1&:' | sort -n -k2 | sort -k 1,1 -s | cut -d' ' -f3- | tar -cf /var/tmp/llvm.bysuffix.size.tar -T -

It is a pretty common knowledge that ordering of members within archive can affect the compression ratio. I’ve done some quick testing and the results somewhat surprised me. Firstly, it turned out that the simplest lexical sorting by name (path) gave the best result. Secondly, because it turned out that the difference between that and sorting by size was as large as 8%.

Note that this is a pretty specific source archive, so results may vary. Test details and commands in the remainder of the post.

Compression results per sort order
Sort order Size in bytes Compared to best
name 108 011 756 100.00%
suffix 108 573 612 100.52%
size (smallest first) 116 797 440 108.13%
size (largest first) 116 645 940 108.00%
suffix + size 111 709 128 103.42%

The conclusion? Sorting can affect compression ratio more than I have anticipated. However, all the “obvious” optimizations have made the result worse than plain lexical sorting. Perhaps it’s just the matter of well-organized source code keeping similar files in the same directories. Perhaps there is a way to optimize it even more (and beat sorting by name). One interesting option would be to group files by bucket sizes, and then sort by name.

Special thanks to Adrien Nader and Lasse Collin from #tukaani for inspiring me to do this.

Testing details

Testing was done on data from llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef.tar.gz snapshot. The archive was unpacked, then repacked using sorted file lists. Directory entries were not included in the archive.

Sorting by name was done using plain lexical sort, using the following command:

find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d | sort  | tar -cvvf /var/tmp/llvm.byname.tar -T -

Sorting by size involved numeric sort by file size, then by name:

find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d -printf '%s %p\n' | sort -n | cut -d' ' -f2- | tar -cf /var/tmp/llvm.bysize.tar -T -
find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d -printf '%s %p\n' | sort -k 1nr,2 | cut -d' ' -f2- | tar -cvvf /var/tmp/llvm.bysize.rev.tar -T -

Sorting by suffix involved extracting the final file suffix (if any) and sorting by it, then by full name:

find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d | sed -r -e 's:^:_ :' -e 's:.*\.([^/]*):\1&:' | sort | cut -d' ' -f2- | tar -cf /var/tmp/llvm.bysuffix.tar -T -

Sorting by suffix + size involved sorting by suffix first (so that similar files are grouped together), then by size, then by full path. Took some find(1) hackery:

find llvm-project-f6f1fd443f48f417de9dfe23353055f1b20d87ef '!' -type d -printf '%s %p\n' | sed -r -e 's:^:_ :' -e 's:.*\.([^/]*):\1&:' | sort -n -k2 | sort -k 1,1 -s | cut -d' ' -f3- | tar -cf /var/tmp/llvm.bysuffix.size.tar -T -

October 07 2022

Clang in Gentoo now sets default runtimes via config file

Michał Górny (mgorny) October 07, 2022, 5:27

The upcoming clang 16 release features substantial improvements to configuration file support. Notably, it adds support for specifying multiple files and better default locations. This enabled Gentoo to finally replace the default-* flags used on sys-devel/clang, effectively empowering our users with the ability to change defaults without rebuilding whole clang.

This change has also been partially backported to clang 15.0.2 in Gentoo, and (unless major problems are reported) will be part of the stable clang 15.x release (currently planned for upcoming 15.0.3).

In this post, I’d like to shortly describe the new configuration file features, how much of them have been backported to 15.x in Gentoo and how defaults are going to be selected from now on.

Configuration file support in clang 16.x

Configuration files were supported at least for a few clang releases now but they weren’t very useful for us before. With clang 16, I have taken the opportunity to finally change that.

In clang 16, configuration files can be both specified explicitly and loaded from default locations. Default configuration files are loaded first (unless explicitly disabled by --no-default-config or a non-empty CLANG_NO_DEFAULT_CONFIG envvar — the latter intended to be used in clang’s test suites), and configuration files specified via --config= options are loaded afterwards. This permits explicit files to override the options specified in default configs. However, it should be noted that some values are appended rather than overrode, and there is no way to “reset” them right now.

As built in Gentoo, clang looks for configuration files in two locations: /etc/clang and the executable directory. Technically, the build system also permits specifying “user” configuration directory but it’s not practically useful, as it provides no way of referencing the user’s home directory. Effectively, we only use /etc/clang. The sys-devel/clang-common package installs a default set of configuration files there.

The default config lookup algorithm looks for <triple>-<driver>.cfg first. If this file is not found, it looks for separate <triple>.cfg and <driver>.cfg files and loads both. This enables the first location to be used as an override, without suffering from the appending problem. Triple is the effective target triple (i.e. accounting for options such as --target= and -m32) and driver is the string corresponding to driver mode, e.g. clang, clang++ or clang-cpp (but it does not account for -x c++ option!).

So for example, on a typical amd64 system clang will first try:

x86_64-pc-linux-gnu-clang.cfg

with fallback to loading both of:

x86_64-pc-linux-gnu.cfg
clang.cfg

If -m32 is used, this will be i386-pc-linux-gnu* instead. If clang++ is called, this will be *clang++.cfg, etc.

Explicit configuration files are specified using the --config=<file> option. They are loaded after the default configs, in order of being listed. They can either be specified by full path, or by bare filename. In the latter case, clang looks for them in the directories listed earlier.

Configuration files use response file syntax, i.e. you specify command-line options inside them as you would pass them on the command-line. They also support including additional files via @<filename> syntax. An example configuration file could specify:

-I/opt/mystuffs/include
-L/opt/mystuffs/lib
@gentoo-runtimes.cfg

Full documentation: Clang Compiler User’s Manual: configuration files.

The backport to clang 15.0.2

There are notable differences in configuration file support in clang 15.x:

  • the new --config=<file> spelling is not supported, you have to use --config <file> (yes, compilers don’t follow getopt_long rules)
  • only one configuration file can be loaded
  • --config disables loading default configuration files
  • there is no explicit --no-default-config option

The rules for loading configuration files were also different (and not very reliable). I have mostly backported the new lookup rules. However, since this version does not support loading multiple configuration files (and I did not want to diverge too far from vanilla), only <triple>-<driver>.cfg and <driver>.cfg names are supported (i.e. pure-triple variant is not).

I think this divergence is acceptable because Gentoo did not enable configuration file support before (i.e. did not specify the system configuration directory), so there is no reason to assume that any of regular Gentoo users would have relied on the prior logic.

Use of configuration files in Gentoo

We currently install two “base” configuration files: gentoo-gcc-install.cfg and gentoo-runtimes.cfg.

gentoo-gcc-install.cfg is used to provide the path to the GCC installation. Its initial contents are e.g.:

# This file is maintained by gcc-config.
# It is used to specify the selected GCC installation.
--gcc-install-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0

The intent is that gcc-config would update this file when switching between versions and clang would not have to include logic for reading from Gentoo-specific files anymore. However, gcc-config part is not yet implemented, and it is not clear if this solution will be workable long-term, as it probably breaks support for sysroots.

gentoo-runtimes.cfg is used to control the default runtimes and link editor used. Its initial contents are controlled by USE flags on the clang-common package and are e.g.:

# This file is initially generated by sys-devel/clang-runtime.
# It is used to control the default runtimes using by clang.

--rtlib=libgcc
--unwindlib=libgcc
--stdlib=libstdc++
-fuse-ld=bfd

On top of these two files, we install the actual configuration files for the three driver modes relevant to Gentoo compilations to use: clang.cfg, clang++.cfg and clang-cpp.cfg. All of them have the following contents:

# This configuration file is used by clang driver.
@gentoo-runtimes.cfg
@gentoo-gcc-install.cfg

Effectively, they defer into loading the two base files.

Future use of config files

Why now?, you might ask. After all, configuration files were there for a while now, and I haven’t taken the effort to fix them to be usable before. Well, it all started with the total mayhem caused by clang 15.x becoming more strict. While we managed to convince upstream to revert that change and defer it into 16.x, it became important for us to be easily able to test for the breakage involved in clang changing its behavior.

One part of this effort was starting to package clang 16 snapshots. If you’d like to help testing it, you can unmask them using the following package.accept_keywords snippet:

=dev-ml/llvm-ocaml-16.0.0_pre* **
=dev-python/lit-16.0.0_pre* **
=dev-util/lldb-16.0.0_pre* **
=dev-python/clang-python-16.0.0_pre* **
=sys-devel/clang-16.0.0_pre* **
=sys-devel/clang-common-16.0.0_pre* **
=sys-devel/clang-runtime-16.0.0_pre* **
=sys-devel/lld-16.0.0_pre* **
=sys-devel/llvm-16.0.0_pre* **
=sys-devel/llvm-common-16.0.0_pre* **
=sys-libs/compiler-rt-16.0.0_pre* **
=sys-libs/compiler-rt-sanitizers-16.0.0_pre* **
=sys-libs/libcxx-16.0.0_pre* **
=sys-libs/libcxxabi-16.0.0_pre* **
=sys-libs/libcxxrt-16.0.0_pre* **
=sys-libs/libomp-16.0.0_pre* **
=sys-libs/llvm-libunwind-16.0.0_pre* **
=dev-libs/libclc-16.0.0_pre* **

~sys-devel/llvmgold-16 **
~sys-devel/clang-toolchain-symlinks-16 **
~sys-devel/lld-toolchain-symlinks-16 **
~sys-devel/llvm-toolchain-symlinks-16 **

The other part is providing support for configuration files that can be used to easily pass -Werror= and -Wno-error= flags reliably to all clang invocations, and therefore adjust its behavior while testing.

With this, we should be able to assess the damage beforehand earlier. However, between the size of clang 16 tracker and upstream continuing to make the behavior more strict, I’m starting to have doubts whether clang will continue being a general-purpose that could be reliably used as a replacement for GCC, and not just a corporation-driven tool used to compile a few big projects.

The upcoming clang 16 release features substantial improvements to configuration file support. Notably, it adds support for specifying multiple files and better default locations. This enabled Gentoo to finally replace the default-* flags used on sys-devel/clang, effectively empowering our users with the ability to change defaults without rebuilding whole clang.

This change has also been partially backported to clang 15.0.2 in Gentoo, and (unless major problems are reported) will be part of the stable clang 15.x release (currently planned for upcoming 15.0.3).

In this post, I’d like to shortly describe the new configuration file features, how much of them have been backported to 15.x in Gentoo and how defaults are going to be selected from now on.

Configuration file support in clang 16.x

Configuration files were supported at least for a few clang releases now but they weren’t very useful for us before. With clang 16, I have taken the opportunity to finally change that.

In clang 16, configuration files can be both specified explicitly and loaded from default locations. Default configuration files are loaded first (unless explicitly disabled by --no-default-config or a non-empty CLANG_NO_DEFAULT_CONFIG envvar — the latter intended to be used in clang’s test suites), and configuration files specified via --config= options are loaded afterwards. This permits explicit files to override the options specified in default configs. However, it should be noted that some values are appended rather than overrode, and there is no way to “reset” them right now.

As built in Gentoo, clang looks for configuration files in two locations: /etc/clang and the executable directory. Technically, the build system also permits specifying “user” configuration directory but it’s not practically useful, as it provides no way of referencing the user’s home directory. Effectively, we only use /etc/clang. The sys-devel/clang-common package installs a default set of configuration files there.

The default config lookup algorithm looks for <triple>-<driver>.cfg first. If this file is not found, it looks for separate <triple>.cfg and <driver>.cfg files and loads both. This enables the first location to be used as an override, without suffering from the appending problem. Triple is the effective target triple (i.e. accounting for options such as --target= and -m32) and driver is the string corresponding to driver mode, e.g. clang, clang++ or clang-cpp (but it does not account for -x c++ option!).

So for example, on a typical amd64 system clang will first try:

x86_64-pc-linux-gnu-clang.cfg

with fallback to loading both of:

x86_64-pc-linux-gnu.cfg
clang.cfg

If -m32 is used, this will be i386-pc-linux-gnu* instead. If clang++ is called, this will be *clang++.cfg, etc.

Explicit configuration files are specified using the --config=<file> option. They are loaded after the default configs, in order of being listed. They can either be specified by full path, or by bare filename. In the latter case, clang looks for them in the directories listed earlier.

Configuration files use response file syntax, i.e. you specify command-line options inside them as you would pass them on the command-line. They also support including additional files via @<filename> syntax. An example configuration file could specify:

-I/opt/mystuffs/include
-L/opt/mystuffs/lib
@gentoo-runtimes.cfg

Full documentation: Clang Compiler User’s Manual: configuration files.

The backport to clang 15.0.2

There are notable differences in configuration file support in clang 15.x:

  • the new --config=<file> spelling is not supported, you have to use --config <file> (yes, compilers don’t follow getopt_long rules)
  • only one configuration file can be loaded
  • --config disables loading default configuration files
  • there is no explicit --no-default-config option

The rules for loading configuration files were also different (and not very reliable). I have mostly backported the new lookup rules. However, since this version does not support loading multiple configuration files (and I did not want to diverge too far from vanilla), only <triple>-<driver>.cfg and <driver>.cfg names are supported (i.e. pure-triple variant is not).

I think this divergence is acceptable because Gentoo did not enable configuration file support before (i.e. did not specify the system configuration directory), so there is no reason to assume that any of regular Gentoo users would have relied on the prior logic.

Use of configuration files in Gentoo

We currently install two “base” configuration files: gentoo-gcc-install.cfg and gentoo-runtimes.cfg.

gentoo-gcc-install.cfg is used to provide the path to the GCC installation. Its initial contents are e.g.:

# This file is maintained by gcc-config.
# It is used to specify the selected GCC installation.
--gcc-install-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0

The intent is that gcc-config would update this file when switching between versions and clang would not have to include logic for reading from Gentoo-specific files anymore. However, gcc-config part is not yet implemented, and it is not clear if this solution will be workable long-term, as it probably breaks support for sysroots.

gentoo-runtimes.cfg is used to control the default runtimes and link editor used. Its initial contents are controlled by USE flags on the clang-common package and are e.g.:

# This file is initially generated by sys-devel/clang-runtime.
# It is used to control the default runtimes using by clang.

--rtlib=libgcc
--unwindlib=libgcc
--stdlib=libstdc++
-fuse-ld=bfd

On top of these two files, we install the actual configuration files for the three driver modes relevant to Gentoo compilations to use: clang.cfg, clang++.cfg and clang-cpp.cfg. All of them have the following contents:

# This configuration file is used by clang driver.
@gentoo-runtimes.cfg
@gentoo-gcc-install.cfg

Effectively, they defer into loading the two base files.

Future use of config files

Why now?, you might ask. After all, configuration files were there for a while now, and I haven’t taken the effort to fix them to be usable before. Well, it all started with the total mayhem caused by clang 15.x becoming more strict. While we managed to convince upstream to revert that change and defer it into 16.x, it became important for us to be easily able to test for the breakage involved in clang changing its behavior.

One part of this effort was starting to package clang 16 snapshots. If you’d like to help testing it, you can unmask them using the following package.accept_keywords snippet:

=dev-ml/llvm-ocaml-16.0.0_pre* **
=dev-python/lit-16.0.0_pre* **
=dev-util/lldb-16.0.0_pre* **
=dev-python/clang-python-16.0.0_pre* **
=sys-devel/clang-16.0.0_pre* **
=sys-devel/clang-common-16.0.0_pre* **
=sys-devel/clang-runtime-16.0.0_pre* **
=sys-devel/lld-16.0.0_pre* **
=sys-devel/llvm-16.0.0_pre* **
=sys-devel/llvm-common-16.0.0_pre* **
=sys-libs/compiler-rt-16.0.0_pre* **
=sys-libs/compiler-rt-sanitizers-16.0.0_pre* **
=sys-libs/libcxx-16.0.0_pre* **
=sys-libs/libcxxabi-16.0.0_pre* **
=sys-libs/libcxxrt-16.0.0_pre* **
=sys-libs/libomp-16.0.0_pre* **
=sys-libs/llvm-libunwind-16.0.0_pre* **
=dev-libs/libclc-16.0.0_pre* **

~sys-devel/llvmgold-16 **
~sys-devel/clang-toolchain-symlinks-16 **
~sys-devel/lld-toolchain-symlinks-16 **
~sys-devel/llvm-toolchain-symlinks-16 **

The other part is providing support for configuration files that can be used to easily pass -Werror= and -Wno-error= flags reliably to all clang invocations, and therefore adjust its behavior while testing.

With this, we should be able to assess the damage beforehand earlier. However, between the size of clang 16 tracker and upstream continuing to make the behavior more strict, I’m starting to have doubts whether clang will continue being a general-purpose that could be reliably used as a replacement for GCC, and not just a corporation-driven tool used to compile a few big projects.

September 12 2022

Refining ROCm Packages in Gentoo — project summary

Gentoo Google Summer of Code (GSoC) September 12, 2022, 13:07

12 weeks quickly slips away, and I’m proud to say that the packaging quality of ROCm in Gentoo does gets improved in this project.

Two sets of major deliverables are achieved: New ebuilds of ROCm-5.1.3 tool-chain that purely depends on vanilla llvm/clang, and rocm.eclass along with ROCm-5.1.3 libraries utilizing them. Each brings one great QA improvement compare to the original ROCm packaging method.

Beyond these, I also maintained rocprofiler, rocm-opencl-runtimes, bumping their version with nontrivial changes. I discovered several bugs, and talked to upstream. I also wrote ROCm wiki pages, which starts my journey on Gentoo wiki.

By writing rocm.eclass, I learnt pretty much about eclass writing — how to design, how to balance needs and QA concerns, how to write comments and examples well, etc. I’m really grateful to those Gentoo developers who pointed out my mistakes and helped me polishing my eclass.

Since I’m working on top of Gentoo repo, my work is scattered around rather than having my own repo. My major products can be seen in [0], where all my PRs to ::gentoo located. My weekly report can be found on Gentoo GSoC blogs

[0] My finished PRs for gentoo during GSoC 2022

Details are as followed:

First, it’s about ROCm on vanilla llvm/clang

Originally, ROCm has its own llvm fork, which has some modifications not upstreamed yet. In the original Gentoo ROCm packaging roadmap, sys-devel/llvm-roc is introduced as the ROCm forked llvm/clang. This is the simple way, and worked well on ROCm-only packages [1]. But it brings troubles if a large project like blender pulls in dependencies using vanilla llvm, and results in symbol collision [2].

So, when I noticed [1] in week 1, I began my journey on porting ROCm on vanilla clang. I’m very lucky, because at that time clang-14.0.5 was just released, eliminating major obstacles for porting (previous versions more or less have bugs). After some quick hack I succeeded, which is recorded in the week 1 report [3]. In that week I successfully built blender with hip cycles (GPU-accelerated render code written in HIP), and rendered some example projects on a Radeon RX 6700XT.

While I was thrilled in porting ROCm tool-chain upon vanilla clang, my mentor pointed out that I have carelessly brought some serious bugs in ::gentoo. In week 2, I managed to fix bugs I created, and set up a reproducible test ground using docker, to make test more easy and clean and avoid such bugs from happening again. Details can be found in week 2’s report [4].

After that there weren’t non-trivial progresses in porting to vanilla clang, only bug fixes and ebuild polishing, until I met MIOpen in the last week.

The story of debugging MIOpen assemblies

In week 12 rocm.eclass is almost in its final shape, so I began to land ROCm libraries [1] including sci-libs/miopen. ROCm libraries are usually written in “high level” languages like HIP, while dev-util/hip is already ported to use vanilla clang in good shape, so there is no need to worry compilation problems. However, MIOpen have various hand-written assemblies for JIT, which causes several test failures [5]. It was frustrating because I’m unfamiliar with AMDGPU assemblies, so I was close to gave up (my mentor also suggest to give up working on it in GSoC). Thus, I reported my problem to upstream in [5], attached with my debugging attempts.

Thanks to my testing system mentioned previously, I have setup not only standard environments, but also one snapshot with full llvm/clang debug symbols. I quickly located the problem and reported to upstream via issue, but I still didn’t know why the error is happening.

In the second day, I decided to look at the assembly and debugging result once again. This time fortune is on my side, and I discovered the key issue is LLVM treating Y and N in metadata as boolean values, not strings (they should be kernel parameter names) [6]. I provided a fix in [7], and all tests passed on both Radeon VII and Radeon RX 6700XT. Amazing! I have also mentioned how excited I was in week 12’s report [8].

[1] For example, ROCm libraries in github.com/ROCmSoftwarePlatform
[2] bugs.gentoo.org/693200
[3] Week 1 Report for Refining ROCm Packages in Gentoo
[4] Week 4 Report for Refining ROCm Packages in Gentoo
[5] github.com/ROCmSoftwarePlatform/MIOpen/issues/1731
[6] github.com/ROCmSoftwarePlatform/MIOpen/issues/1731#issuecomment-1236913096
[7] github.com/littlewu2508/gentoo/commit/40eb81f151f43eb5d833dc7440b02f12dab04b89
[8] Week 12 Report for Refining ROCm Packages in Gentoo

The second deliverable is rocm.eclass

The most challenging part for me, is to write rocm.eclass. I started writing it in week 4 [9], and finished my design in week 8 [10] (including 10 days of temporary leave). In week 9-12, I posted 7 revisions of rocm.eclass in gentoo-dev mailing list [10,11], and received many helpful comments. Also, on Github PR [12], I also got lots of suggestions from Gentoo developers.

Eventually, I finished rocm.eclass, providing amdgpu_targets USE_EXPAND, ROCM_REQUIRED_USE, and ROCM_USE_DEP to control which gpu targets to compile, and coherency among dependencies. The eclass provides get_amdgpu_flags for src_configure and check_amdgpu for ensuring AMDGPU device accessibility in src_test. Finally, rocm.eclass is merged into ::gentoo in [13].

[9] Week 9 Report for Refining ROCm Packages in Gentoo
[10] archives.gentoo.org/gentoo-dev/threads/2022-08/
[11] archives.gentoo.org/gentoo-dev/threads/2022-09/
[12] github.com/gentoo/gentoo/pull/26784
[13] gitweb.gentoo.org/repo/gentoo.git/commit/?id=cf8a6a845b68b578772f2ae0d2703f203c6dec33

Other coding products Merged ebuilds rocprofiler

I have bumped dev-util/rocprofiler and its dependencies to version 5.1.3, and fixed proprietary aql profiler lib loading, so ROCm stack on Gentoo stays fully open-sourced without losing most profiling functionalities [14].

[14] github.com/ROCm-Developer-Tools/rocprofiler/issues/38

Unmerged ebuilds

Due to limited time and long testing period, ebuilds of ROCm-5.1.3 libraries (ones using rocm.eclass) does not get merged. They can be found in this PR.
dev-libs/rocm-opencl-runtime is a critical package because it provides opencl, and many users still use opencl for GPGPU since HIP is a new stuff. I bumped it to 5.1.3 to match the vanilla clang tool-chain, and enabled its src_test, so users can make sure that vanilla clang isn’t breaking anything. The PR is located here.

Bug fixes

Existing bug fixing is also a part of my GSoC. I have created various PRs and closed corresponding bugs on Gentoo Bugzilla: #822828, #853718, #851795, #851792, #852236, #850937, #836248, #836274, #866839. Also, many bug fixing happens before new packages enter the gentoo main repo, or they are found by myself in the first place, so there is no record on Bugzilla.

Last but not least, the wiki page

I have created 3 pages [15-17], filling important information about ROCm. I also received a lot of help from the Gentoo community, mainly focused on refining my wiki to meet the standards.

[15] wiki.gentoo.org/wiki/ROCm
[16] wiki.gentoo.org/wiki/HIP
[17] wiki.gentoo.org/wiki/Rocprofiler

Comparison with original plan

The original plan in proposal also contained rocm.eclass. But it only allocated the last week for “investigation on vanilla clang”. In week 1, my mentor and I added “porting ROCm on vanilla clang” to the plan, and this became the new major deliverable. Due to the time limit, packaging high level frameworks like pytorch and tensorflow is abandoned. I only worked to get CuPy worked [18], showing rocm.eclass functionality on packages that depend on ROCm libraries.

I think the change of plan and deliverables better annotated the project title “Refining”, because what I did greatly improves the quality of existing ebuilds, rather than introducing more ebuilds.

[18] github.com/littlewu2508/gentoo/commit/3d142fa4b4ada560c053c2fd3c8c1501c82aace2

12 weeks quickly slips away, and I’m proud to say that the packaging quality of ROCm in Gentoo does gets improved in this project.

Two sets of major deliverables are achieved: New ebuilds of ROCm-5.1.3 tool-chain that purely depends on vanilla llvm/clang, and rocm.eclass along with ROCm-5.1.3 libraries utilizing them. Each brings one great QA improvement compare to the original ROCm packaging method.

Beyond these, I also maintained rocprofiler, rocm-opencl-runtimes, bumping their version with nontrivial changes. I discovered several bugs, and talked to upstream. I also wrote ROCm wiki pages, which starts my journey on Gentoo wiki.

By writing rocm.eclass, I learnt pretty much about eclass writing — how to design, how to balance needs and QA concerns, how to write comments and examples well, etc. I’m really grateful to those Gentoo developers who pointed out my mistakes and helped me polishing my eclass.

Since I’m working on top of Gentoo repo, my work is scattered around rather than having my own repo. My major products can be seen in [0], where all my PRs to ::gentoo located. My weekly report can be found on Gentoo GSoC blogs

[0] My finished PRs for gentoo during GSoC 2022

Details are as followed:

First, it’s about ROCm on vanilla llvm/clang

Originally, ROCm has its own llvm fork, which has some modifications not upstreamed yet. In the original Gentoo ROCm packaging roadmap, sys-devel/llvm-roc is introduced as the ROCm forked llvm/clang. This is the simple way, and worked well on ROCm-only packages [1]. But it brings troubles if a large project like blender pulls in dependencies using vanilla llvm, and results in symbol collision [2].

So, when I noticed [1] in week 1, I began my journey on porting ROCm on vanilla clang. I’m very lucky, because at that time clang-14.0.5 was just released, eliminating major obstacles for porting (previous versions more or less have bugs). After some quick hack I succeeded, which is recorded in the week 1 report [3]. In that week I successfully built blender with hip cycles (GPU-accelerated render code written in HIP), and rendered some example projects on a Radeon RX 6700XT.

While I was thrilled in porting ROCm tool-chain upon vanilla clang, my mentor pointed out that I have carelessly brought some serious bugs in ::gentoo. In week 2, I managed to fix bugs I created, and set up a reproducible test ground using docker, to make test more easy and clean and avoid such bugs from happening again. Details can be found in week 2’s report [4].

After that there weren’t non-trivial progresses in porting to vanilla clang, only bug fixes and ebuild polishing, until I met MIOpen in the last week.

The story of debugging MIOpen assemblies

In week 12 rocm.eclass is almost in its final shape, so I began to land ROCm libraries [1] including sci-libs/miopen. ROCm libraries are usually written in “high level” languages like HIP, while dev-util/hip is already ported to use vanilla clang in good shape, so there is no need to worry compilation problems. However, MIOpen have various hand-written assemblies for JIT, which causes several test failures [5]. It was frustrating because I’m unfamiliar with AMDGPU assemblies, so I was close to gave up (my mentor also suggest to give up working on it in GSoC). Thus, I reported my problem to upstream in [5], attached with my debugging attempts.

Thanks to my testing system mentioned previously, I have setup not only standard environments, but also one snapshot with full llvm/clang debug symbols. I quickly located the problem and reported to upstream via issue, but I still didn’t know why the error is happening.

In the second day, I decided to look at the assembly and debugging result once again. This time fortune is on my side, and I discovered the key issue is LLVM treating Y and N in metadata as boolean values, not strings (they should be kernel parameter names) [6]. I provided a fix in [7], and all tests passed on both Radeon VII and Radeon RX 6700XT. Amazing! I have also mentioned how excited I was in week 12’s report [8].

[1] For example, ROCm libraries in https://github.com/ROCmSoftwarePlatform
[2] https://bugs.gentoo.org/693200
[3] Week 1 Report for Refining ROCm Packages in Gentoo
[4] Week 4 Report for Refining ROCm Packages in Gentoo
[5] https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1731
[6] https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1731#issuecomment-1236913096
[7] https://github.com/littlewu2508/gentoo/commit/40eb81f151f43eb5d833dc7440b02f12dab04b89
[8] Week 12 Report for Refining ROCm Packages in Gentoo

The second deliverable is rocm.eclass

The most challenging part for me, is to write rocm.eclass. I started writing it in week 4 [9], and finished my design in week 8 [10] (including 10 days of temporary leave). In week 9-12, I posted 7 revisions of rocm.eclass in gentoo-dev mailing list [10,11], and received many helpful comments. Also, on Github PR [12], I also got lots of suggestions from Gentoo developers.

Eventually, I finished rocm.eclass, providing amdgpu_targets USE_EXPAND, ROCM_REQUIRED_USE, and ROCM_USE_DEP to control which gpu targets to compile, and coherency among dependencies. The eclass provides get_amdgpu_flags for src_configure and check_amdgpu for ensuring AMDGPU device accessibility in src_test. Finally, rocm.eclass is merged into ::gentoo in [13].

[9] Week 9 Report for Refining ROCm Packages in Gentoo
[10] https://archives.gentoo.org/gentoo-dev/threads/2022-08/
[11] https://archives.gentoo.org/gentoo-dev/threads/2022-09/
[12] https://github.com/gentoo/gentoo/pull/26784
[13] https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=cf8a6a845b68b578772f2ae0d2703f203c6dec33

Other coding products

Merged ebuilds

rocprofiler

I have bumped dev-util/rocprofiler and its dependencies to version 5.1.3, and fixed proprietary aql profiler lib loading, so ROCm stack on Gentoo stays fully open-sourced without losing most profiling functionalities [14].

[14] https://github.com/ROCm-Developer-Tools/rocprofiler/issues/38

Unmerged ebuilds

Due to limited time and long testing period, ebuilds of ROCm-5.1.3 libraries (ones using rocm.eclass) does not get merged. They can be found in this PR.
dev-libs/rocm-opencl-runtime is a critical package because it provides opencl, and many users still use opencl for GPGPU since HIP is a new stuff. I bumped it to 5.1.3 to match the vanilla clang tool-chain, and enabled its src_test, so users can make sure that vanilla clang isn’t breaking anything. The PR is located here.

Bug fixes

Existing bug fixing is also a part of my GSoC. I have created various PRs and closed corresponding bugs on Gentoo Bugzilla: #822828, #853718, #851795, #851792, #852236, #850937, #836248, #836274, #866839. Also, many bug fixing happens before new packages enter the gentoo main repo, or they are found by myself in the first place, so there is no record on Bugzilla.

Last but not least, the wiki page

I have created 3 pages [15-17], filling important information about ROCm. I also received a lot of help from the Gentoo community, mainly focused on refining my wiki to meet the standards.

[15] https://wiki.gentoo.org/wiki/ROCm
[16] https://wiki.gentoo.org/wiki/HIP
[17] https://wiki.gentoo.org/wiki/Rocprofiler

Comparison with original plan

The original plan in proposal also contained rocm.eclass. But it only allocated the last week for “investigation on vanilla clang”. In week 1, my mentor and I added “porting ROCm on vanilla clang” to the plan, and this became the new major deliverable. Due to the time limit, packaging high level frameworks like pytorch and tensorflow is abandoned. I only worked to get CuPy worked [18], showing rocm.eclass functionality on packages that depend on ROCm libraries.

I think the change of plan and deliverables better annotated the project title “Refining”, because what I did greatly improves the quality of existing ebuilds, rather than introducing more ebuilds.

[18] https://github.com/littlewu2508/gentoo/commit/3d142fa4b4ada560c053c2fd3c8c1501c82aace2

September 11 2022

Week 12 Report for Refining ROCm Packages in Gentoo

Gentoo Google Summer of Code (GSoC) September 11, 2022, 10:16

Although this is the final week, I would like to say that it is as exciting as the first week.

I kept polishing rocm.eclass with the help of Michał and my mentor, and it is now in good shape [1]. I must admit that the time to write an eclass for a beginner like me is much more than what I expected. In my proposal, I leave 4 weeks to finish it, 2-week implementation and 2-week polishing. In reality, I implemented within 2 weeks, but polished it for 4 weeks. I made a lot of QA issues and was not aware, which increases the number of review-modify cycles. During this process, I leant a lot:

1. Always re-read the eclass, especially comments and examples thoroughly after modification. Many times I forgot there is an example far from the change that should be updated because one functions changes its behavior.

2. Read the bash manual carefully, because properly usage of features like bash array can greatly simplify code.

3. Consider the maintenance difficulty of the eclass. I wrote a oddly specific `src_test`, which can cover all the cases of ROCm packages. But it’s not worth it, because specialized code should be placed into ebuilds, not one eclass. So instead, I remain the most common part, `check_amdgpu`, and get rid of phase functions, which made the eclass much cleaner.

I also find some bugs and their solutions. As I mentioned in week 10’s report, I observed many test failures in sci-libs/miopen based on vanilla clang. In this week, I figured out that they have 3 different reasons, and I’ve provided the two fixes for two failures ([2, 3]). The third issue, I’ve found it’s root cause [4]. I believe there would be a simple solution to this.

For gcc-12 issues, I also come to a brutal workaround [5]: undef the __noinline__ macro before including stdc++ headers and def it afterwards. I also observed that clang-15 does not fix this issue as expected, and provided a MWE at [6].

I’m also writing wiki pages, filling installation and developing guide.

In this 12-week project, I proposed to deliver rocm.eclass, and packages like pytorch, tensorflow with rocm enabled. Instead, I delivered rocm.eclass as proposed, but migrated the ROCm toolchain to vanilla clang. I thought porting ROCm toolchain to vanilla clang is closer to my project title “Refining ROCm Packages” ♦

[1] github.com/gentoo/gentoo/pull/26784
[2] github.com/littlewu2508/gentoo/commit/2bfae2e26a23d78b634a87ef4a0b3f0cc242dbc4
[3] github.com/littlewu2508/gentoo/commit/cd11b542aec825338ec396bce5c63bbced534e27
[4] github.com/ROCmSoftwarePlatform/MIOpen/issues/1731
[5] github.com/littlewu2508/gentoo/commit/2a49b4db336b075f2ac1fdfbc907f828105ea7e1
[6] github.com/llvm/llvm-project/issues/57544

Although this is the final week, I would like to say that it is as exciting as the first week.

I kept polishing rocm.eclass with the help of Michał and my mentor, and it is now in good shape [1]. I must admit that the time to write an eclass for a beginner like me is much more than what I expected. In my proposal, I leave 4 weeks to finish it, 2-week implementation and 2-week polishing. In reality, I implemented within 2 weeks, but polished it for 4 weeks. I made a lot of QA issues and was not aware, which increases the number of review-modify cycles. During this process, I leant a lot:

1. Always re-read the eclass, especially comments and examples thoroughly after modification. Many times I forgot there is an example far from the change that should be updated because one functions changes its behavior.

2. Read the bash manual carefully, because properly usage of features like bash array can greatly simplify code.

3. Consider the maintenance difficulty of the eclass. I wrote a oddly specific `src_test`, which can cover all the cases of ROCm packages. But it’s not worth it, because specialized code should be placed into ebuilds, not one eclass. So instead, I remain the most common part, `check_amdgpu`, and get rid of phase functions, which made the eclass much cleaner.

I also find some bugs and their solutions. As I mentioned in week 10’s report, I observed many test failures in sci-libs/miopen based on vanilla clang. In this week, I figured out that they have 3 different reasons, and I’ve provided the two fixes for two failures ([2, 3]). The third issue, I’ve found it’s root cause [4]. I believe there would be a simple solution to this.

For gcc-12 issues, I also come to a brutal workaround [5]: undef the __noinline__ macro before including stdc++ headers and def it afterwards. I also observed that clang-15 does not fix this issue as expected, and provided a MWE at [6].

I’m also writing wiki pages, filling installation and developing guide.

In this 12-week project, I proposed to deliver rocm.eclass, and packages like pytorch, tensorflow with rocm enabled. Instead, I delivered rocm.eclass as proposed, but migrated the ROCm toolchain to vanilla clang. I thought porting ROCm toolchain to vanilla clang is closer to my project title “Refining ROCm Packages” 🙂

[1] https://github.com/gentoo/gentoo/pull/26784
[2] https://github.com/littlewu2508/gentoo/commit/2bfae2e26a23d78b634a87ef4a0b3f0cc242dbc4
[3] https://github.com/littlewu2508/gentoo/commit/cd11b542aec825338ec396bce5c63bbced534e27
[4] https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1731
[5] https://github.com/littlewu2508/gentoo/commit/2a49b4db336b075f2ac1fdfbc907f828105ea7e1
[6] https://github.com/llvm/llvm-project/issues/57544

Week 11 Report for Refining ROCm Packages in Gentoo

Gentoo Google Summer of Code (GSoC) September 11, 2022, 10:14

My progress this week is mainly writing wiki and refining rocm.eclass.

Although the current eclass can work with my new ebuilds [1], Michał Górny has pointed out various flaws on the Github PR [2]. He also pointed out the necessity about rocm.eclass, because it seems like a combination of two eclasses. In my opinion, rocm.eclass has its value, mainly for handling USE_EXPANDS and common phase functions. The ugly part is mainly in rocm_src_test: due to the inconsistency of test methods of packages in [3], I have to detect which method is using and do it accordingly. So my plan is to split the one-size-fits-all rocm_src_test into two functions, corresponding to two scenarios (cmake test or standalone binary), and let each ebuild decide which to use. This can avoid detailed detection code that make rocm_src_test bloated.

Wiki writing: I think the main part of ROCm wiki[1] and HIP[2] is nearly finished. But due to the delay of rocm.eclass, the related information is not appended (ROCm#Developing guide). There is also a section a reserved: ROCm#Installation guide. I have little clue on how to write this part, because ROCm is a wide collection of packages. Maybe a meta package (there are users working on this) would be helpful.

To be honest I’m a bit anxious, because there is only one week left, but there are still a lot to be determined and tested on rocm.eclass along with the sci-libs/roc* ebuilds. I hope I can resolve these core issues in the last week.

[1] github.com/littlewu2508/gentoo/tree/rocm-5.1.3-scilibs
[2] github.com/gentoo/gentoo/pull/26784
[3] github.com/ROCmSoftwarePlatform
[4] wiki.gentoo.org/wiki/ROCm
[5] wiki.gentoo.org/wiki/HIP

My progress this week is mainly writing wiki and refining rocm.eclass.

Although the current eclass can work with my new ebuilds [1], Michał Górny has pointed out various flaws on the Github PR [2]. He also pointed out the necessity about rocm.eclass, because it seems like a combination of two eclasses. In my opinion, rocm.eclass has its value, mainly for handling USE_EXPANDS and common phase functions. The ugly part is mainly in rocm_src_test: due to the inconsistency of test methods of packages in [3], I have to detect which method is using and do it accordingly. So my plan is to split the one-size-fits-all rocm_src_test into two functions, corresponding to two scenarios (cmake test or standalone binary), and let each ebuild decide which to use. This can avoid detailed detection code that make rocm_src_test bloated.

Wiki writing: I think the main part of ROCm wiki[1] and HIP[2] is nearly finished. But due to the delay of rocm.eclass, the related information is not appended (ROCm#Developing guide). There is also a section a reserved: ROCm#Installation guide. I have little clue on how to write this part, because ROCm is a wide collection of packages. Maybe a meta package (there are users working on this) would be helpful.

To be honest I’m a bit anxious, because there is only one week left, but there are still a lot to be determined and tested on rocm.eclass along with the sci-libs/roc* ebuilds. I hope I can resolve these core issues in the last week.

[1] https://github.com/littlewu2508/gentoo/tree/rocm-5.1.3-scilibs
[2] https://github.com/gentoo/gentoo/pull/26784
[3] https://github.com/ROCmSoftwarePlatform
[4] https://wiki.gentoo.org/wiki/ROCm
[5] https://wiki.gentoo.org/wiki/HIP

Week 10 Report for Refining ROCm Packages in Gentoo

Gentoo Google Summer of Code (GSoC) September 11, 2022, 10:10

This week I have leant a lot from Ulrich’s comments on rocm.eclass. I polished the eclass to v3 and send to gentoo-dev mailing list. However, I observed another error introduced in v3, and I’ll include a fix for it in the v4 in the following days.

Another half of my time is spent on testing sci-libs/roc-* packages on various platforms, utilizing rocm.eclass. I can say that rocm.eclass did its job as expected, so I believe after v4 it can be merged.

With src_test enabled, I have found various test failures. rocBLAS-5.1.3 fails 3 tests on Radeon RX 6700XT, slightly exceeding tolerance, which seems not a big issue; rocFFT-5.1.3 fails 16 suites on Radeon VII [1], which is serious and confirmed by upstream, so I suggest masking <code>amdgpu_targets_gfx906</code> USE flag for rocFFT-5.1.3; just today I observe MIOpen is failing many tests, probably due to vanilla clang. I’ll open issues and report those test failures to upstream. Running tests suite takes a lot of time, and often drain the GPU. It may takes more than 15 hours testing rocBLAS, even on performant CPU like Ryzen 5950X. If I use the GPU to render graphics (run a desktop environment) and do test simultaneously, it often result in amdgpu driver failure. I hope one day we can have a testing farm for ROCm packages, but that would be expensive because there are a lot of GPU architectures, and the compilation takes a lot of time.

I planned to finish the draft of wiki pages [2,3], but turns out I’m running out of time. I’ll catch up in week 11. My mentor is also busy in week 10, so my PR about rocm-opencl-runtime is still pending for review. Now we are working on solving the dependency issue of ROCm packages — gcc-12 and gcc-11.3.0 incompatibilities. Due to two bugs, the current stable gcc, gcc-11.3.0 cannot compile some ROCm packages [4], and the current unstable gcc, gcc-12, is unable to compile nearly all ROCm packages [5].

I’ll continue to do what’s postponed in week 10 — landing rocm.eclass and sci-libs packages, preparing cupy, fixing bugs, and writing the wiki pages. I’ll investigate MIOpen’s situation as well.

[1] github.com/ROCmSoftwarePlatform/rocFFT/issues/369
[2] wiki.gentoo.org/wiki/ROCm
[3] wiki.gentoo.org/wiki/HIP
[4] bugs.gentoo.org/842405
[5] bugs.gentoo.org/857660

This week I have leant a lot from Ulrich’s comments on rocm.eclass. I polished the eclass to v3 and send to gentoo-dev mailing list. However, I observed another error introduced in v3, and I’ll include a fix for it in the v4 in the following days.

Another half of my time is spent on testing sci-libs/roc-* packages on various platforms, utilizing rocm.eclass. I can say that rocm.eclass did its job as expected, so I believe after v4 it can be merged.

With src_test enabled, I have found various test failures. rocBLAS-5.1.3 fails 3 tests on Radeon RX 6700XT, slightly exceeding tolerance, which seems not a big issue; rocFFT-5.1.3 fails 16 suites on Radeon VII [1], which is serious and confirmed by upstream, so I suggest masking <code>amdgpu_targets_gfx906</code> USE flag for rocFFT-5.1.3; just today I observe MIOpen is failing many tests, probably due to vanilla clang. I’ll open issues and report those test failures to upstream. Running tests suite takes a lot of time, and often drain the GPU. It may takes more than 15 hours testing rocBLAS, even on performant CPU like Ryzen 5950X. If I use the GPU to render graphics (run a desktop environment) and do test simultaneously, it often result in amdgpu driver failure. I hope one day we can have a testing farm for ROCm packages, but that would be expensive because there are a lot of GPU architectures, and the compilation takes a lot of time.

I planned to finish the draft of wiki pages [2,3], but turns out I’m running out of time. I’ll catch up in week 11. My mentor is also busy in week 10, so my PR about rocm-opencl-runtime is still pending for review. Now we are working on solving the dependency issue of ROCm packages — gcc-12 and gcc-11.3.0 incompatibilities. Due to two bugs, the current stable gcc, gcc-11.3.0 cannot compile some ROCm packages [4], and the current unstable gcc, gcc-12, is unable to compile nearly all ROCm packages [5].

I’ll continue to do what’s postponed in week 10 — landing rocm.eclass and sci-libs packages, preparing cupy, fixing bugs, and writing the wiki pages. I’ll investigate MIOpen’s situation as well.

[1] https://github.com/ROCmSoftwarePlatform/rocFFT/issues/369
[2] https://wiki.gentoo.org/wiki/ROCm
[3] https://wiki.gentoo.org/wiki/HIP
[4] https://bugs.gentoo.org/842405
[5] https://bugs.gentoo.org/857660

Week 9 Report for Refining ROCm Packages in Gentoo

Gentoo Google Summer of Code (GSoC) September 11, 2022, 10:07

This week I mainly focused on dev-libs/rocm-opencl-runtime.

I bumped dev-libs/rocm-opencl-runtime to 5.1.3. That’s relatively easy. The difficult part is enabling its tests. I came across a major problem, which is oclgl test requiring X server. I compiled using debug options and use gdb to dive into the code, but found there is no simple solution. Currently the test needs a X server where OpenGL vender is AMD. Xvfb only provides llvmpipe, not meeting the requirements. I consulted some friends, they said NVIDIA recommends using EGL when there is no X [1], but apparently ROCm can only get OpenGL from X [2]. So my workaround is to let user passing an X display into the ebuild, by reading the environment variable OCLGL_DISPLAY (DISPLAY variable will be wiped when calling emerge, while this can survive). If no display is detected, or glxinfo shows the OpenGL vendor is not AMD, then src_test dies, throwing indications about running an X server using amdgpu driver.

I was also trapped by CRLF problem in src_test of dev-libs/rocm-opencl-runtime. Tests in oclperf.exclude should be skipped for oclperf test, but it did not. After numerous trials, I finally found that this file is using CRLF, not LF, which causes the exclusion failed ♦

Nevertheless, rocm-opencl-runtime tests passed on Radeon RX 6700XT! A good thing, because I know many user in Gentoo rely on this package to provide opencl in their computation, and the correctness is vital. Before we does not have src_test enabled. The PR is now in [6].

Other works including starting wiki writing [3,4], refine rocm.eclass according to feedback (not much, see gentoo-dev mailing list), and found a bug of dev-util/hipFindHIP.cmake module is not in the correct place. Fix can be found in [5] but I need to further polish the patch before PR.

If no further suggestions on rocm.eclass, I’ll land rocm.eclass in ::gentoo next week, and start bumping the sci-libs version already done locally.

[1] developer.nvidia.com/blog/egl-eye-opengl-visualization-without-x-server/
[2] github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/bbdc87e08b322d349f82bdd7575c8ce94d31d276/tests/ocltst/module/common/OCLGLCommonLinux.cpp
[3] wiki.gentoo.org/wiki/ROCm
[4] wiki.gentoo.org/wiki/HIP
[5] github.com/littlewu2508/gentoo/tree/hip-correct-cmake
[6] github.com/gentoo/gentoo/pull/26870

This week I mainly focused on dev-libs/rocm-opencl-runtime.

I bumped dev-libs/rocm-opencl-runtime to 5.1.3. That’s relatively easy. The difficult part is enabling its tests. I came across a major problem, which is oclgl test requiring X server. I compiled using debug options and use gdb to dive into the code, but found there is no simple solution. Currently the test needs a X server where OpenGL vender is AMD. Xvfb only provides llvmpipe, not meeting the requirements. I consulted some friends, they said NVIDIA recommends using EGL when there is no X [1], but apparently ROCm can only get OpenGL from X [2]. So my workaround is to let user passing an X display into the ebuild, by reading the environment variable OCLGL_DISPLAY (DISPLAY variable will be wiped when calling emerge, while this can survive). If no display is detected, or glxinfo shows the OpenGL vendor is not AMD, then src_test dies, throwing indications about running an X server using amdgpu driver.

I was also trapped by CRLF problem in src_test of dev-libs/rocm-opencl-runtime. Tests in oclperf.exclude should be skipped for oclperf test, but it did not. After numerous trials, I finally found that this file is using CRLF, not LF, which causes the exclusion failed 🙁

Nevertheless, rocm-opencl-runtime tests passed on Radeon RX 6700XT! A good thing, because I know many user in Gentoo rely on this package to provide opencl in their computation, and the correctness is vital. Before we does not have src_test enabled. The PR is now in [6].

Other works including starting wiki writing [3,4], refine rocm.eclass according to feedback (not much, see gentoo-dev mailing list), and found a bug of dev-util/hipFindHIP.cmake module is not in the correct place. Fix can be found in [5] but I need to further polish the patch before PR.

If no further suggestions on rocm.eclass, I’ll land rocm.eclass in ::gentoo next week, and start bumping the sci-libs version already done locally.

[1] https://developer.nvidia.com/blog/egl-eye-opengl-visualization-without-x-server/
[2] https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/bbdc87e08b322d349f82bdd7575c8ce94d31d276/tests/ocltst/module/common/OCLGLCommonLinux.cpp
[3] https://wiki.gentoo.org/wiki/ROCm
[4] https://wiki.gentoo.org/wiki/HIP
[5] https://github.com/littlewu2508/gentoo/tree/hip-correct-cmake
[6] https://github.com/gentoo/gentoo/pull/26870

Week 8 Report for Refining ROCm Packages in Gentoo

Gentoo Google Summer of Code (GSoC) September 11, 2022, 10:04

This week there are two major progress: dev-util/rocprofiler and rocm.eclass.

I have implemented all the functions I think necessary for rocm.eclass. It was just send to rocm.eclass draft to gentoo-dev mailing list (also with a Github PR at [1]), please have a review. In the following weeks, I will collect feedbacks and continue to polish it.

In summary, I have implemented those functions which is listed in my proposal:
USE_EXPNAD of amdgpu_targets_, and ROCM_USEDEP to make the use flag coherent among dependencies;
rocm_src_configure contains common arguments in src_prepare;
rocm_src_test which checks the permission on /dev/kfd and /dev/dri/render*

There are also something listed in proposal but I decided not to implement now:
rocm_src_prepare: although there are some similarities among ebuilds, src_prepare are highly customized to each ROCm components. Unifying would take extra work.
SRC_URI: currently all SRC_URI is already specified in each ebuilds. It does not hurt to keep the status quo.

Moreover, during implementation I found another feature necessary
rocm_src_test: correctly handles different scenarios. ROCm packages may have cmake test, which can be run using cmake_src_test, or only compiled some testing binaries which requires execution from command-line. I made rocm_src_test automatically detect the method, so ROCm packages just have to call this function directly without doing anything.

Actually I have never imagined rocm.eclass could be in this shape eventually. Initially I just thought it would provide some utilities, mainly src_test and USE_EXPAND. But when implementing I found all these feature requires careful treatment. The comments (mainly examples) also takes half of the length. It ends up in 278 lines, which is a middle-sized among current eclasses. Maybe it can be further trimmed down after polishing, because there could be awkward implementations or re-inventions in it.

Based on my draft rocm.eclass, I have prepared sci-libs/roc*=5.1.3, sci-lib/hip-*-5.1.3 and dev-python/cupy making use of it. It feels great to simplify the ebuilds, and portage can handles the USE_EXPAND and dependencies just as expected. Once the rocm.eclass get in tree, I’ll push those ROCm-5.1.3 ebuilds.

Anther thing to mention is that ROCm-5.1.3 toolchains finally get merged [5], with the fixed dev-util/rocprofiler-{4.3.0,5.0.2,5.1.3}. rocprofiler is actually buggy before, because I thought I committed the patch which stripped the libhsa-amd-aqlprofile.so loading (I even claimed it in the commit message), but it was not committed and lost in history. So I reproduced the patch. Also, I did some research about this proprietary lib. By default, not loading it means tracing hsa/hip is not possible — you only get basic information like name and time of each GPU kernel execution, but do not know the pipeline of kernel execution (which one has spawned which kernel). AQL should be HSA architected queuing language (HSA AQL), where llvm.org/docs/AMDGPUUsage.html#hsa-aql-queue documented. It did sound related to the pipeline of kernel dispatching. By the description, libhsa-amd-aqlprofile.so is an extension API of AQL Profile. But actually, patching the source code to let rocprofiler not loading libhsa-amd-aqlprofile.so does not breaks the tracing of hsa/hip. So, I’m not sure why libhsa-amd-aqlprofile.so is needed, and raised a question at [2]. So I complete the fix in [3,4].

According to the renewed proposal (I have been leaving for two weeks, so there are changes in plan), I should collect feedback and refine rocm.eclass, and prepare dev-python/cupy and sci-libs/rocWMMA. I’ll investigate ROCgdb, too. Also, rocm-device-libs is a major package because many users relies on it to provide opencl. I’ll work on bumping its version, too. What’s more, with hip-5.1.3 against vanilla clang, rocm for blender can land in ::gentoo.

[1] github.com/gentoo/gentoo/pull/26784
[2] github.com/RadeonOpenCompute/ROCm/issues/1781
[3] github.com/gentoo/gentoo/pull/26755
[4] github.com/gentoo/gentoo/pull/26771
[5] github.com/gentoo/gentoo/pull/26441

This week there are two major progress: dev-util/rocprofiler and rocm.eclass.

I have implemented all the functions I think necessary for rocm.eclass. It was just send to rocm.eclass draft to gentoo-dev mailing list (also with a Github PR at [1]), please have a review. In the following weeks, I will collect feedbacks and continue to polish it.

In summary, I have implemented those functions which is listed in my proposal:
USE_EXPNAD of amdgpu_targets_, and ROCM_USEDEP to make the use flag coherent among dependencies;
rocm_src_configure contains common arguments in src_prepare;
rocm_src_test which checks the permission on /dev/kfd and /dev/dri/render*

There are also something listed in proposal but I decided not to implement now:
rocm_src_prepare: although there are some similarities among ebuilds, src_prepare are highly customized to each ROCm components. Unifying would take extra work.
SRC_URI: currently all SRC_URI is already specified in each ebuilds. It does not hurt to keep the status quo.

Moreover, during implementation I found another feature necessary
rocm_src_test: correctly handles different scenarios. ROCm packages may have cmake test, which can be run using cmake_src_test, or only compiled some testing binaries which requires execution from command-line. I made rocm_src_test automatically detect the method, so ROCm packages just have to call this function directly without doing anything.

Actually I have never imagined rocm.eclass could be in this shape eventually. Initially I just thought it would provide some utilities, mainly src_test and USE_EXPAND. But when implementing I found all these feature requires careful treatment. The comments (mainly examples) also takes half of the length. It ends up in 278 lines, which is a middle-sized among current eclasses. Maybe it can be further trimmed down after polishing, because there could be awkward implementations or re-inventions in it.

Based on my draft rocm.eclass, I have prepared sci-libs/roc*=5.1.3, sci-lib/hip-*-5.1.3 and dev-python/cupy making use of it. It feels great to simplify the ebuilds, and portage can handles the USE_EXPAND and dependencies just as expected. Once the rocm.eclass get in tree, I’ll push those ROCm-5.1.3 ebuilds.

Anther thing to mention is that ROCm-5.1.3 toolchains finally get merged [5], with the fixed dev-util/rocprofiler-{4.3.0,5.0.2,5.1.3}. rocprofiler is actually buggy before, because I thought I committed the patch which stripped the libhsa-amd-aqlprofile.so loading (I even claimed it in the commit message), but it was not committed and lost in history. So I reproduced the patch. Also, I did some research about this proprietary lib. By default, not loading it means tracing hsa/hip is not possible — you only get basic information like name and time of each GPU kernel execution, but do not know the pipeline of kernel execution (which one has spawned which kernel). AQL should be HSA architected queuing language (HSA AQL), where https://llvm.org/docs/AMDGPUUsage.html#hsa-aql-queue documented. It did sound related to the pipeline of kernel dispatching. By the description, libhsa-amd-aqlprofile.so is an extension API of AQL Profile. But actually, patching the source code to let rocprofiler not loading libhsa-amd-aqlprofile.so does not breaks the tracing of hsa/hip. So, I’m not sure why libhsa-amd-aqlprofile.so is needed, and raised a question at [2]. So I complete the fix in [3,4].

According to the renewed proposal (I have been leaving for two weeks, so there are changes in plan), I should collect feedback and refine rocm.eclass, and prepare dev-python/cupy and sci-libs/rocWMMA. I’ll investigate ROCgdb, too. Also, rocm-device-libs is a major package because many users relies on it to provide opencl. I’ll work on bumping its version, too. What’s more, with hip-5.1.3 against vanilla clang, rocm for blender can land in ::gentoo.

[1] https://github.com/gentoo/gentoo/pull/26784
[2] https://github.com/RadeonOpenCompute/ROCm/issues/1781
[3] https://github.com/gentoo/gentoo/pull/26755
[4] https://github.com/gentoo/gentoo/pull/26771
[5] https://github.com/gentoo/gentoo/pull/26441

September 05 2022

Week 12 Report for RISC-V Support for Gentoo Prefix

Gentoo Google Summer of Code (GSoC) September 05, 2022, 9:34

Hello all,
Hope you all are doing good, this is my report for 12th week of my Google Summer of Code project.

I got documentation on Porting Prefix reviewed and I have added the suggested changes.

My GSoC delieverables have been completed, so I played around with the compatibility layer and ansible. Synced the latest changes to the bootstrap script from upstream and used it for installing prefix. Working on updating the main.yml[1] accordingly. The process has been smooth so far, within next few weeks we might have a working compatibility layer for RISC-V.

Will start working on the final report and update the blogs on Gentoo Blog site. Although the official period is over I will continue working on compatibility layer and there are also few other things like pkgcraft in my bucket list which I will get my hands on.

The 12 weeks of GSoC have been super fun, thanks to mentors and the community.

[1] github.com/EESSI/compatibility-layer/blob/main/ansible/playbooks/roles/compatibility_layer/defaults/main.yml

Regards,
wiredhikari

Hello all,
Hope you all are doing good, this is my report for 12th week of my Google Summer of Code project.

I got documentation on Porting Prefix reviewed and I have added the suggested changes.

My GSoC delieverables have been completed, so I played around with the compatibility layer and ansible. Synced the latest changes to the bootstrap script from upstream and used it for installing prefix. Working on updating the main.yml[1] accordingly. The process has been smooth so far, within next few weeks we might have a working compatibility layer for RISC-V.

Will start working on the final report and update the blogs on Gentoo Blog site. Although the official period is over I will continue working on compatibility layer and there are also few other things like pkgcraft in my bucket list which I will get my hands on.

The 12 weeks of GSoC have been super fun, thanks to mentors and the community.

[1] https://github.com/EESSI/compatibility-layer/blob/main/ansible/playbooks/roles/compatibility_layer/defaults/main.yml

Regards,
wiredhikari

September 04 2022

Gentoo musl Support Expansion for Qt/KDE Week 12

Gentoo Google Summer of Code (GSoC) September 04, 2022, 22:41

This week has been mostly been spent on writing documentation and fixing up some left over things.

I started with looking over the *-standalone libraries. It turns out that tree.h is provided by libbsd and because libbsd works just fine on musl I removed the standalone. The second thing I did was removing error.h because it caused issues with some builds, and we suspect it works on Void Linux because they build packages inside a clean chroot (without error.h). The only one left is now cdefs.h. This header is an internal glibc header, and using it is basically a bug, so upstreaming fixes should be very easy. Therefore I feel like this doesn’t need to be added either, so I closed the pull request for now.

Next I rewrote Sam’s musl porting notes, moving it from his personal page to a “real” wiki page (wiki.gentoo.org/wiki/Musl_porting_notes). It’s now more like a wiki page and less like a list of errors with attached fixes. I’ve also added several things myself into it.

Another wiki I’ve added stuff to is Chroot (wiki.gentoo.org/wiki/Chroot#Sound_and_graphics). In my GSoC planning I wanted to write documentation about using Gentoo musl. There I wanted information about how to work around using glibc programs that do not work on musl, ex proprietary programs. Instead of doing that I wrote documentation about how running graphical applications with sound into the Chroot documentation, as it helps every Gentoo user. I don’t think Gentoo musl users should have any issues finding the Chroot wikipage. ♦

I have also tested gettext-tiny on Gentoo musl. This is a smaller implementation of gettext with some functionality stubbed out. gettext-tiny is built for musl, and it makes use of the libintl provided by musl. For users that only want English this makes a lot of sense because it is much smaller than gettext but still allows most packages to be built. When replacing gettext Portage complained about two packages using uninstalled libraries from GNU gettext, those being bison and poxml. When reemerging bison it errored out and I was sure it was because of gettext, but after debugging bison I found out it was caused by error-standalone. After unmerging error-standalone bison detected that the library was not installed and it compiled correctly. Poxml on the other hand hard depends on libgettextpo, a library not provided by gettext-tiny. Running “equery d -a poxml” however we can see that nothing important actually depends on poxml, so gettext-tiny should for the most part be fine.

$ equery d -a poxml
* These packages depend on poxml:
kde-apps/kdesdk-meta-22.04.3-r1 (>=kde-apps/poxml-22.04.3:5)
kde-apps/kdesdk-meta-22.08.0 (>=kde-apps/poxml-22.08.0:5)

Next week I will write my final evaluation and then I am done with GSoC! I will however continue working with some things like ebuildshell and crossdev when I have time.

This week has been mostly been spent on writing documentation and fixing up some left over things.

I started with looking over the *-standalone libraries. It turns out that tree.h is provided by libbsd and because libbsd works just fine on musl I removed the standalone. The second thing I did was removing error.h because it caused issues with some builds, and we suspect it works on Void Linux because they build packages inside a clean chroot (without error.h). The only one left is now cdefs.h. This header is an internal glibc header, and using it is basically a bug, so upstreaming fixes should be very easy. Therefore I feel like this doesn’t need to be added either, so I closed the pull request for now.

Next I rewrote Sam’s musl porting notes, moving it from his personal page to a “real” wiki page (https://wiki.gentoo.org/wiki/Musl_porting_notes). It’s now more like a wiki page and less like a list of errors with attached fixes. I’ve also added several things myself into it.

Another wiki I’ve added stuff to is Chroot (https://wiki.gentoo.org/wiki/Chroot#Sound_and_graphics). In my GSoC planning I wanted to write documentation about using Gentoo musl. There I wanted information about how to work around using glibc programs that do not work on musl, ex proprietary programs. Instead of doing that I wrote documentation about how running graphical applications with sound into the Chroot documentation, as it helps every Gentoo user. I don’t think Gentoo musl users should have any issues finding the Chroot wikipage. 🙂

I have also tested gettext-tiny on Gentoo musl. This is a smaller implementation of gettext with some functionality stubbed out. gettext-tiny is built for musl, and it makes use of the libintl provided by musl. For users that only want English this makes a lot of sense because it is much smaller than gettext but still allows most packages to be built. When replacing gettext Portage complained about two packages using uninstalled libraries from GNU gettext, those being bison and poxml. When reemerging bison it errored out and I was sure it was because of gettext, but after debugging bison I found out it was caused by error-standalone. After unmerging error-standalone bison detected that the library was not installed and it compiled correctly. Poxml on the other hand hard depends on libgettextpo, a library not provided by gettext-tiny. Running “equery d -a poxml” however we can see that nothing important actually depends on poxml, so gettext-tiny should for the most part be fine.

$ equery d -a poxml
* These packages depend on poxml:
kde-apps/kdesdk-meta-22.04.3-r1 (>=kde-apps/poxml-22.04.3:5)
kde-apps/kdesdk-meta-22.08.0 (>=kde-apps/poxml-22.08.0:5)

Next week I will write my final evaluation and then I am done with GSoC! I will however continue working with some things like ebuildshell and crossdev when I have time.

August 29 2022

Gentoo musl Support Expansion for Qt/KDE Week 11

Gentoo Google Summer of Code (GSoC) August 29, 2022, 21:32

This week has mostly been dedicated to fixing old, and harder problems that I had previously put off. I spent a whole lot of time learning about the AccountsService codebase and setting up systems with LDAP authentication, but it turned out it didn’t need a rewrite after reading a couple of issues on the GitLab page, more on that later.

To start with I added a CMAKE_SKIP_TESTS variable to cmake.eclass. Currently you need to specify skipped tests by doing myctestargs=( -E ‘(test1|test2|test3)’ ). This works fine for the most part, but if you need to specify skipped tests multiple times it gets really messy, because ctest does not allow you to pass -E multiple times. Personally I ran into this when fixing tests for kde-apps/okular. Most tests for Okular only pass when it’s installed (#653618), but the ebuild already skips some tests for other reasons. So I needed to first unconditionally disable some tests, and then conditionally with “has_version ${P} || append tests”. To solve it I introduced an array and then parsed it with myctestargs+=( -E '('$( IFS='|'; echo "${CMAKE_SKIP_TESTS[*]}")')' ), but as this was useful for a lot more ebuilds than just Okular I decided to implement it in the eclass.

The second thing I worked on was AccountsService, it’s a daemon that retrieves a list of users on the system and presents them with a DBus interface. It’s used for showing users in things like login managers and accounts settings panels. I actually worked on this a long time ago but I put it off for a while because it required a bigger rewrite, and I had more important things to do back then.
AccountsService has two issues on musl. It uses the glibc function fgetspent_r, and wtmp which is just implemented as stubs in musl (wiki.musl-libc.org/faq.html#Q:-Why-is-the-utmp/wtmp-functionality-only-implemented-as-stubs?). I asked in #musl to figure out a fgetspent_r replacement, but we then discussed why it was bad to enumerate /etc/passwd to get a list of users, for example it does not respect non-local (LDAP/NIS users), so AS needed a bigger rewrite, we thought :).
So I started with setting up two virtual machines, one LDAP client, and one server. Having never used LDAP before this was a little hard but I got it working. I also needed to set up QEMU networking so that my VMs could connect to each other, and I also set up an LDAP webui called ldap-ui so I could easily get an overview of my LDAP instance. Because AS works by providing a DBus interface I also learned using the qdbusviewer and dbus-send tools. Before taking a deep dive into the AS source code I wrote some small test programs to get comfortable with the DBus C API, passwd+shadow database functions, and GLib.
I then started reading the AccountsService source code to understand it better, its main.c just sets up a daemon that’s mostly defined in daemon.c, the rest of the source files are mostly just helpers and wrappers. When the daemon initializes it sets up user enumerators using the entry_generator_* functions. The main one is entry_generator_fgetpwent, this generator uses fgetspent_r to enumerate /etc/passwd, and my idea was to replace it with getpwent + getspnam. But there are two other generators, requested_users and cachedir. requested_users takes a requested user (ex. when manually entering username+password in the login manager), and adds it into /var/lib/AccountsService/users. cachedir looks at that directory and adds these entries into the daemon. It turns out that requesting a non-local LDAP user with the requested_users generator is completely fine, and the login information will be cached in the dir so that the cachedir generator can expose it for future logins. I then looked at some issues in the AccountsService GitLab, and it turns out that enumerating /etc/passwd was intentional to not blow up the login screen with thousands of users on a big LDAP domain for example. So, the rewrite was sadly not needed, but I learned a lot! Still, fixing fgetspent_r and wtmp needs to get done, but I already have a fix for that.

Another thing I spent a lot of time on this week was poxml. This is also an old issue that I put off, mostly because it was too hard at the time. The build crashes because it can’t find the function gl_get_setlocale_null_lock in libgettextpo.so. This shared object belongs to GNU Gettext, so I something was wrong with that. Looking at the so with nm --dynamic /usr/lib/libgettextpo.so I could see that the function was undefined, bad! We reported this issue to upstream and got into a long conversation. Apparently Bruno (GNU) used Alpine Linux which packages GNU libintl, while Gentoo uses the musl libintl implementation. GNU libintl actually provides gl_get_setlocale_null_lock which explains why it worked on Alpine without issue. After grepping for gl_get_setlocale_null_lock I found this:
/* When it is known that the gl_get_setlocale_null_lock function is defined by a dependency library, it should not be defined here. */
#if OMIT_SETLOCALE_LOCK
*do nothing*
#else
*define gl_get_setlocale_null_lock*
#fi

So I tried just forcing the check to false, and it worked! I then looked at the build system and expected something like AC_SEARCH_LIBS([gl_get_setlocale_null_lock], [intl], ...) *set OMIT_SETLOCALE_LOCK*, but it turns out that autotools just forces OMIT_SETLOCALE_LOCK to 1. This is clearly wrong so I sent another comment upstream and temporarily fixed it in the Gentoo tree. Instead of doing it properly I made an ugly hack to not get complacent (sams idea) and hopefully we can get it resolved upstream instead :D.

To summarize I feel like this week has gone pretty good. I’ve solved everything that was left and now I’m ready to start writing a lot of documentation. A lot of the accountsservice setup and work was ultimately unnecessary but I still learned a lot.

This week has mostly been dedicated to fixing old, and harder problems that I had previously put off. I spent a whole lot of time learning about the AccountsService codebase and setting up systems with LDAP authentication, but it turned out it didn’t need a rewrite after reading a couple of issues on the GitLab page, more on that later.

To start with I added a CMAKE_SKIP_TESTS variable to cmake.eclass. Currently you need to specify skipped tests by doing myctestargs=( -E ‘(test1|test2|test3)’ ). This works fine for the most part, but if you need to specify skipped tests multiple times it gets really messy, because ctest does not allow you to pass -E multiple times. Personally I ran into this when fixing tests for kde-apps/okular. Most tests for Okular only pass when it’s installed (#653618), but the ebuild already skips some tests for other reasons. So I needed to first unconditionally disable some tests, and then conditionally with “has_version ${P} || append tests”. To solve it I introduced an array and then parsed it with myctestargs+=( -E '('$( IFS='|'; echo "${CMAKE_SKIP_TESTS[*]}")')' ), but as this was useful for a lot more ebuilds than just Okular I decided to implement it in the eclass.

The second thing I worked on was AccountsService, it’s a daemon that retrieves a list of users on the system and presents them with a DBus interface. It’s used for showing users in things like login managers and accounts settings panels. I actually worked on this a long time ago but I put it off for a while because it required a bigger rewrite, and I had more important things to do back then.
AccountsService has two issues on musl. It uses the glibc function fgetspent_r, and wtmp which is just implemented as stubs in musl (https://wiki.musl-libc.org/faq.html#Q:-Why-is-the-utmp/wtmp-functionality-only-implemented-as-stubs?). I asked in #musl to figure out a fgetspent_r replacement, but we then discussed why it was bad to enumerate /etc/passwd to get a list of users, for example it does not respect non-local (LDAP/NIS users), so AS needed a bigger rewrite, we thought :).
So I started with setting up two virtual machines, one LDAP client, and one server. Having never used LDAP before this was a little hard but I got it working. I also needed to set up QEMU networking so that my VMs could connect to each other, and I also set up an LDAP webui called ldap-ui so I could easily get an overview of my LDAP instance. Because AS works by providing a DBus interface I also learned using the qdbusviewer and dbus-send tools. Before taking a deep dive into the AS source code I wrote some small test programs to get comfortable with the DBus C API, passwd+shadow database functions, and GLib.
I then started reading the AccountsService source code to understand it better, its main.c just sets up a daemon that’s mostly defined in daemon.c, the rest of the source files are mostly just helpers and wrappers. When the daemon initializes it sets up user enumerators using the entry_generator_* functions. The main one is entry_generator_fgetpwent, this generator uses fgetspent_r to enumerate /etc/passwd, and my idea was to replace it with getpwent + getspnam. But there are two other generators, requested_users and cachedir. requested_users takes a requested user (ex. when manually entering username+password in the login manager), and adds it into /var/lib/AccountsService/users. cachedir looks at that directory and adds these entries into the daemon. It turns out that requesting a non-local LDAP user with the requested_users generator is completely fine, and the login information will be cached in the dir so that the cachedir generator can expose it for future logins. I then looked at some issues in the AccountsService GitLab, and it turns out that enumerating /etc/passwd was intentional to not blow up the login screen with thousands of users on a big LDAP domain for example. So, the rewrite was sadly not needed, but I learned a lot! Still, fixing fgetspent_r and wtmp needs to get done, but I already have a fix for that.

Another thing I spent a lot of time on this week was poxml. This is also an old issue that I put off, mostly because it was too hard at the time. The build crashes because it can’t find the function gl_get_setlocale_null_lock in libgettextpo.so. This shared object belongs to GNU Gettext, so I something was wrong with that. Looking at the so with nm --dynamic /usr/lib/libgettextpo.so I could see that the function was undefined, bad! We reported this issue to upstream and got into a long conversation. Apparently Bruno (GNU) used Alpine Linux which packages GNU libintl, while Gentoo uses the musl libintl implementation. GNU libintl actually provides gl_get_setlocale_null_lock which explains why it worked on Alpine without issue. After grepping for gl_get_setlocale_null_lock I found this:
/* When it is known that the gl_get_setlocale_null_lock function is defined by a dependency library, it should not be defined here. */
#if OMIT_SETLOCALE_LOCK
*do nothing*
#else
*define gl_get_setlocale_null_lock*
#fi

So I tried just forcing the check to false, and it worked! I then looked at the build system and expected something like AC_SEARCH_LIBS([gl_get_setlocale_null_lock], [intl], ...) *set OMIT_SETLOCALE_LOCK*, but it turns out that autotools just forces OMIT_SETLOCALE_LOCK to 1. This is clearly wrong so I sent another comment upstream and temporarily fixed it in the Gentoo tree. Instead of doing it properly I made an ugly hack to not get complacent (sams idea) and hopefully we can get it resolved upstream instead :D.

To summarize I feel like this week has gone pretty good. I’ve solved everything that was left and now I’m ready to start writing a lot of documentation. A lot of the accountsservice setup and work was ultimately unnecessary but I still learned a lot.

August 28 2022

Week 11 Report for RISC-V Support for Gentoo Prefix

Gentoo Google Summer of Code (GSoC) August 28, 2022, 9:32

Hello all,

Hope everyone is fine. This is my report for the 11th week of my GSoC project. This week I worked on documentation, closing dangling pr’s and looked into bootstrapping the EESSI compat layer for RISC-V. I spent some of my time learning Ansible as a part of the process.

The documentation[1] is almost complete, I will work on feedbacks of mentors and pass it through some review softwares and fix accordingly. In the upcoming week I will look into EESSI compat layer for RISC-V and a blog for end-term evaluations.

[1] github.com/wiredhikari/prefix_on_riscv/blob/main/docs/porting.md

Regards,

wiredhikari

Hello all,

Hope everyone is fine. This is my report for the 11th week of my GSoC project. This week I worked on documentation, closing dangling pr’s and looked into bootstrapping the EESSI compat layer for RISC-V. I spent some of my time learning Ansible as a part of the process.

The documentation[1] is almost complete, I will work on feedbacks of mentors and pass it through some review softwares and fix accordingly. In the upcoming week I will look into EESSI compat layer for RISC-V and a blog for end-term evaluations.

[1] https://github.com/wiredhikari/prefix_on_riscv/blob/main/docs/porting.md

Regards,

wiredhikari

June 29 2022

Binding the World

Tim Harder (pkgcraft) June 29, 2022, 18:29

One of Gentoo’s major weaknesses is the lack of a shared implementation that natively supports bindings to other languages for core, specification-level features such as dependency format parsing. Due to this deficiency, over the years I’ve seen the same algorithms implemented in Python, C, Bash, Go, and more at varying levels of success.

Now, I must note that I don’t mean to disparage these efforts especially when done for fun or to learn a new language; however, it often seems they end up in tools or services used by the wider community. Then as the specification slowly evolves and authors move on, developers are stuck maintaining multiple implementations if they want to keep the related tools or services relevant.

In an ideal world, the canonical implementation for a core feature set is written in a language that can be easily bound by other languages offering developers the choice to reuse this support without having to write their own. To exhibit this possibility, one of pkgcraft’s goals is to act as a core library supporting language bindings.

Design

Interfacing rust code with another language often requires a C wrapper library to perform efficiently while sidestepping rust’s lifetime model that clashes with ownership-based languages. Bindings build on top of this C layer, allowing ignorance of the rust underneath.

For pkgcraft, this C library is provided via pkgcraft-c, currently wrapping pkgcraft’s core depspec functionality (package atoms) in addition to providing the initial interface for config, repo, and package interactions.

For some languages it’s also possible to develop bindings or support directly in rust. There are a decent number of currently evolving, language-specific projects that allow non-rust language development including pyo3 for python, rutie for ruby, neon for Node.js, and others. These projects generally wrap the unsafe C layer internally, allowing for simpler development. Generally speaking, I recommend going this route if performance levels and project goals can be met.

Originally, pkgcraft used pyo3 for its python bindings. If one is familiar with rust and python, the development experience is relatively pleasant and allows simpler builds using maturin rather then the pile of technical debt that distutils, setuptools, and its extensions provide when trying to do anything outside the ordinary.

However, pyo3 has a couple, currently unresolved issues that lead me to abandon it. First, the speed of its class instantiation is slower than then native python implementation, even for simple classes. It should be noted this is only important if your design involves creating thousands of native object instances at a python level. It’s often preferable to avoid this overhead by exposing functionality to interact with large groups of rust objects. In addition, for most developers coming from native python the performance hit won’t be overly noticeable. In any case, class instantiation overhead will probably decrease as the project matures and more work is done on optimization.

More importantly, pyo3 does not support exposing any object that contains fields using explicit lifetimes. This means any struct that contains borrowed fields can’t be directly exported due to the clashes between the memory models and ownership designs of rust and python. It’s quite possible to work around this, but that often means copying data in order for the python side to obtain ownership or redesigning the data structures used on the rust side. Whether this is acceptable will depend on how large the performance hit is or how much work the redesign takes.

For my part, having experience writing native extensions using the CPython API as well as cython, the workarounds necessary to avoid exposing borrowed objects weren’t worth the effort, especially because pkgcraft requires a C API anyway to support C itself and languages lacking compatibility layer projects. Thus I rewrote pkgcraft’s python bindings using cython instead which immediately raised performance near to levels I was initially expecting; however, the downside is quite apparent since the bindings have to manually handle all the type conversions and resource deallocation while calling through the C wrapper. It’s a decent amount more work, but I think the performance benefits are worth it.

Development

First, the tools for building the code should be installed. This includes a recent rust compiler and C compiler. I leave it up to the reader to make use of rustup and/or their distro’s package manager to install the required build tools (and others such as git that are implied).

Next, the code must be pulled down. The easiest way to do this is to recursively clone pkgcraft-workspace which should include semi-recent submodules for all pkgcraft projects:

1
2
git clone --recurse-submodules github.com/pkgcraft/pkgcraft-workspace.git
cd pkgcraft-workspace

From this workspace, pkgcraft-c can be built and various shell variables set in order to build python bindings via the following command:

1
$ source ./build pkgcraft-c

This builds pkgcraft into a shared library that is exposed to the python build via setting $LD_LIBRARY_PATH and $PKG_CONFIG_PATH. Once that completes the python bindings can be built and tested via tox:

1
2
$ cd pkgcraft-python
$ tox -e python

When developing bindings built on top of a C library it’s wise to run the same testsuite under valgrind looking for seemingly inevitable memory leaks, exacerbated by rust requiring all allocations to be returned in order to be freed safely since it historically didn’t use the system allocator. For pkgcraft, this is provided via another tox target:

1
$ tox -e valgrind

If you’re familiar with valgrind, we mainly care about the definitely and indirectly lost categories of memory leaks, the other types relate to global objects or caches that aren’t explicitly deallocated on exit. The valgrind target for tox should error out if any memory leaks are detected so if it completes successfully no leaks were detected.

Benchmarking vs pkgcore and portage

Stepping away from regular development towards more interesting data, pkgcraft provides rough processing and memory benchmark suites in order to compare its nascent python bindings with pkgcore and portage. Currently these only focus on atom object instantiation, but may be extended to include other functionality if the API access isn’t too painful for pkgcore and/or portage.

To run the processing time benchmarks that use pytest-benchmark:

1
$ tox -e bench

For a summary of benchmark results only including the mean and standard deviation:

.ansi2html-content { display: inline; white-space: pre-wrap; word-wrap: break-word; } .ansi1 { font-weight: bold; } .ansi31 { color: #aa0000; } .ansi32 { color: #00aa00; } .ansi33 { color: #aa5500; }
----------------- benchmark 'test_bench_atom_random': 4 tests ------------------
Name (time in us) Mean StdDev
--------------------------------------------------------------------------------
test_bench_atom_random[pkgcraft-Atom]  4.5395 (1.0)  0.3722 (1.0) 
test_bench_atom_random[pkgcraft-cached]  6.2360 (1.37)  1.3386 (3.60) 
test_bench_atom_random[pkgcore-atom]  30.9767 (6.82)  1.1428 (3.07) 
test_bench_atom_random[portage-Atom]  50.2636 (11.07)  19.7562 (53.07) 
--------------------------------------------------------------------------------
--------------------- benchmark 'test_bench_atom_static': 4 tests ----------------------
Name (time in ns) Mean StdDev
----------------------------------------------------------------------------------------
test_bench_atom_static[pkgcraft-cached]  217.2820 (1.0)  5.9821 (1.0) 
test_bench_atom_static[pkgcraft-Atom]  725.2229 (3.34)  41.6775 (6.97) 
test_bench_atom_static[pkgcore-atom]  28,331.4369 (130.39)  942.0003 (157.47) 
test_bench_atom_static[portage-Atom]  33,794.6625 (155.53)  14,358.8390 (>1000.0)
----------------------------------------------------------------------------------------
----------------- benchmark 'test_bench_atom_sorting_best_case': 2 tests ----------------
Name (time in us) Mean StdDev
-----------------------------------------------------------------------------------------
test_bench_atom_sorting_best_case[pkgcraft-Atom]  6.1195 (1.0)  0.2011 (1.0) 
test_bench_atom_sorting_best_case[pkgcore-atom]  936.9403 (153.11)  5.5534 (27.61) 
-----------------------------------------------------------------------------------------
---------------- benchmark 'test_bench_atom_sorting_worst_case': 2 tests -----------------
Name (time in us) Mean StdDev
------------------------------------------------------------------------------------------
test_bench_atom_sorting_worst_case[pkgcraft-Atom]  6.2702 (1.0)  0.3301 (1.0) 
test_bench_atom_sorting_worst_case[pkgcore-atom]  924.1410 (147.39)  6.9942 (21.19) 
------------------------------------------------------------------------------------------

As seen above, pkgcraft is able to instantiate atom objects about 5-6x faster than pkgcore and about 10x faster than portage. For static atoms when using the cached implementation this increases to about 150x faster, meaning portage should look into using an LRU cache for directly created atom objects. With respect to pkgcore’s static result, it also appears to not use caching; however, it does support atom instance caching internally so the benchmark is avoiding that somehow.

When comparing sorting, pkgcraft is well over two orders of magnitude ahead of pkgcore and I imagine portage would fare even worse, but it doesn’t natively support atom object comparisons so isn’t included here.

Beyond processing time it’s often useful to track memory use, especially for languages such as python that are designed more for ease of development than memory efficiency. There are a number of different techniques to track memory use such as projects like guppy3 but they often work with native python objects, ignoring or misrepresenting allocations done in underlying implementations. Instead, pkgcraft includes a simple script that creates a list of a million objects for three different atom types while tracking elapsed time and overall memory use (using resident set size) in separate processes.

To run the memory benchmarks use:

1
$ tox -e membench

Which produces output similar to:

Static atoms (1000000)
----------------------------------------------
implementation memory (elapsed)
----------------------------------------------
pkgcraft 474.2 MB (0.94s)
pkgcraft-cached 8.7 MB (0.27s)
pkgcore 8.4 MB (1.12s)
portage 795.5 MB (10.62s)
Dynamic atoms (1000000)
----------------------------------------------
implementation memory (elapsed)
----------------------------------------------
pkgcraft 955.2 MB (2.93s)
pkgcraft-cached 957.9 MB (3.56s)
pkgcore 1.3 GB (31.01s)
portage 4.0 GB (56.22s)
Random atoms (1000000)
----------------------------------------------
implementation memory (elapsed)
----------------------------------------------
pkgcraft 945.4 MB (3.75s)
pkgcraft-cached 21.3 MB (1.30s)
pkgcore 20.9 MB (2.67s)
portage 3.6 GB (46.77s)

For static atoms, note that pkgcraft-cached and pkgcore’s memory usage is quite close with pkgcore slightly edging ahead due to the extra data pkgcraft stores to speed up comparisons. Another point of interest is that the uncached implementation still beats pkgcore in processing time. This is because the underlying rust implementation has its own cache allowing it to skip unnecessary parsing, leaving the majority of overhead from cython’s object instantiation. Portage is last by a large margin since it doesn’t directly cache atom objects.

Every dynamic atom is different making caching irrelevant so no implementation has a substantial memory usage edge. Without cache speedups, the uncached pkgcraft implementation is the fastest as it has the least overhead. Pkgcore’s memory usage is comparatively respectable, but uses about an order of magnitude more processing time for parsing and instantiation. Portage is again last by an increased margin and appears to perform inefficiently when storing more complex atoms.

Finally, random atoms try to model closer to what is found across the tree in terms of cache hits. As the results show, using cached implementations probably is a good idea for large sets of atoms with occasional overlap in order to save both processing time and memory usage; otherwise, both attributes suffer as seen from portage’s uncached implementation results.

Looking to the future

From the rough benchmarks above, it seems apparent both pkgcore and portage could decrease their overall processing time and/or memory usage by moving to using package atom support from pkgcraft python bindings. While I’m unsure how much of a performance difference it would make, it should at least be noticeably worthwhile when processing large amounts of data, e.g. scanning the entire tree with pkgcheck or sorting atoms during large dependency resolutions.

It’s also clear that using cython’s extension types and C support on top of rust code yield relatively sizeable wins over native python code. From my perspective, it seems worthwhile to implement all core functionality in a similar fashion for projects that last decades like portage already has. The downside of implementing support in a more difficult language should decrease the longer a project remains viable.

In terms of feasibility, it’s probably easier to inject the pkgcraft bindings into portage since its atom support subclasses string objects while pkgcore’s subclasses an internal restriction class, but both should be possible with some redesign. Realistically speaking, neither is likely to occur because both projects lack maintainers with the required combination of skill, time, and interest to perform the rework. In addition, currently doing so in a non-optional fashion would generally restrict projects to fewer targets due to rust’s lack of support for older architectures, but this downside may be somewhat resolved if a viable GCC rust implementation is released in the future.

Other than python, pkgcraft has more basic support available for go supporting package atom and version object interactions. As the core library gains more features, I’ll try to keep working on exposing the same functionality via bindings since I think initial interactions with pkgcraft may be easiest when leveraging it for data processing from scripting languages.

One of Gentoo’s major weaknesses is the lack of a shared implementation that natively supports bindings to other languages for core, specification-level features such as dependency format parsing. Due to this deficiency, over the years I’ve seen the same algorithms implemented in Python, C, Bash, Go, and more at varying levels of success.

Now, I must note that I don’t mean to disparage these efforts especially when done for fun or to learn a new language; however, it often seems they end up in tools or services used by the wider community. Then as the specification slowly evolves and authors move on, developers are stuck maintaining multiple implementations if they want to keep the related tools or services relevant.

In an ideal world, the canonical implementation for a core feature set is written in a language that can be easily bound by other languages offering developers the choice to reuse this support without having to write their own. To exhibit this possibility, one of pkgcraft’s goals is to act as a core library supporting language bindings.

Design

Interfacing rust code with another language often requires a C wrapper library to perform efficiently while sidestepping rust’s lifetime model that clashes with ownership-based languages. Bindings build on top of this C layer, allowing ignorance of the rust underneath.

For pkgcraft, this C library is provided via pkgcraft-c, currently wrapping pkgcraft’s core depspec functionality (package atoms) in addition to providing the initial interface for config, repo, and package interactions.

For some languages it’s also possible to develop bindings or support directly in rust. There are a decent number of currently evolving, language-specific projects that allow non-rust language development including pyo3 for python, rutie for ruby, neon for Node.js, and others. These projects generally wrap the unsafe C layer internally, allowing for simpler development. Generally speaking, I recommend going this route if performance levels and project goals can be met.

Originally, pkgcraft used pyo3 for its python bindings. If one is familiar with rust and python, the development experience is relatively pleasant and allows simpler builds using maturin rather then the pile of technical debt that distutils, setuptools, and its extensions provide when trying to do anything outside the ordinary.

However, pyo3 has a couple, currently unresolved issues that lead me to abandon it. First, the speed of its class instantiation is slower than then native python implementation, even for simple classes. It should be noted this is only important if your design involves creating thousands of native object instances at a python level. It’s often preferable to avoid this overhead by exposing functionality to interact with large groups of rust objects. In addition, for most developers coming from native python the performance hit won’t be overly noticeable. In any case, class instantiation overhead will probably decrease as the project matures and more work is done on optimization.

More importantly, pyo3 does not support exposing any object that contains fields using explicit lifetimes. This means any struct that contains borrowed fields can’t be directly exported due to the clashes between the memory models and ownership designs of rust and python. It’s quite possible to work around this, but that often means copying data in order for the python side to obtain ownership or redesigning the data structures used on the rust side. Whether this is acceptable will depend on how large the performance hit is or how much work the redesign takes.

For my part, having experience writing native extensions using the CPython API as well as cython, the workarounds necessary to avoid exposing borrowed objects weren’t worth the effort, especially because pkgcraft requires a C API anyway to support C itself and languages lacking compatibility layer projects. Thus I rewrote pkgcraft’s python bindings using cython instead which immediately raised performance near to levels I was initially expecting; however, the downside is quite apparent since the bindings have to manually handle all the type conversions and resource deallocation while calling through the C wrapper. It’s a decent amount more work, but I think the performance benefits are worth it.

Development

First, the tools for building the code should be installed. This includes a recent rust compiler and C compiler. I leave it up to the reader to make use of rustup and/or their distro’s package manager to install the required build tools (and others such as git that are implied).

Next, the code must be pulled down. The easiest way to do this is to recursively clone pkgcraft-workspace which should include semi-recent submodules for all pkgcraft projects:

1
2
git clone --recurse-submodules https://github.com/pkgcraft/pkgcraft-workspace.git
cd pkgcraft-workspace

From this workspace, pkgcraft-c can be built and various shell variables set in order to build python bindings via the following command:

1
$ source ./build pkgcraft-c

This builds pkgcraft into a shared library that is exposed to the python build via setting $LD_LIBRARY_PATH and $PKG_CONFIG_PATH. Once that completes the python bindings can be built and tested via tox:

1
2
$ cd pkgcraft-python
$ tox -e python

When developing bindings built on top of a C library it’s wise to run the same testsuite under valgrind looking for seemingly inevitable memory leaks, exacerbated by rust requiring all allocations to be returned in order to be freed safely since it historically didn’t use the system allocator. For pkgcraft, this is provided via another tox target:

1
$ tox -e valgrind

If you’re familiar with valgrind, we mainly care about the definitely and indirectly lost categories of memory leaks, the other types relate to global objects or caches that aren’t explicitly deallocated on exit. The valgrind target for tox should error out if any memory leaks are detected so if it completes successfully no leaks were detected.

Benchmarking vs pkgcore and portage

Stepping away from regular development towards more interesting data, pkgcraft provides rough processing and memory benchmark suites in order to compare its nascent python bindings with pkgcore and portage. Currently these only focus on atom object instantiation, but may be extended to include other functionality if the API access isn’t too painful for pkgcore and/or portage.

To run the processing time benchmarks that use pytest-benchmark:

1
$ tox -e bench

For a summary of benchmark results only including the mean and standard deviation:

----------------- benchmark 'test_bench_atom_random': 4 tests ------------------
Name (time in us) Mean StdDev
--------------------------------------------------------------------------------
test_bench_atom_random[pkgcraft-Atom]  4.5395 (1.0)  0.3722 (1.0) 
test_bench_atom_random[pkgcraft-cached]  6.2360 (1.37)  1.3386 (3.60) 
test_bench_atom_random[pkgcore-atom]  30.9767 (6.82)  1.1428 (3.07) 
test_bench_atom_random[portage-Atom]  50.2636 (11.07)  19.7562 (53.07) 
--------------------------------------------------------------------------------
--------------------- benchmark 'test_bench_atom_static': 4 tests ----------------------
Name (time in ns) Mean StdDev
----------------------------------------------------------------------------------------
test_bench_atom_static[pkgcraft-cached]  217.2820 (1.0)  5.9821 (1.0) 
test_bench_atom_static[pkgcraft-Atom]  725.2229 (3.34)  41.6775 (6.97) 
test_bench_atom_static[pkgcore-atom]  28,331.4369 (130.39)  942.0003 (157.47) 
test_bench_atom_static[portage-Atom]  33,794.6625 (155.53)  14,358.8390 (>1000.0)
----------------------------------------------------------------------------------------
----------------- benchmark 'test_bench_atom_sorting_best_case': 2 tests ----------------
Name (time in us) Mean StdDev
-----------------------------------------------------------------------------------------
test_bench_atom_sorting_best_case[pkgcraft-Atom]  6.1195 (1.0)  0.2011 (1.0) 
test_bench_atom_sorting_best_case[pkgcore-atom]  936.9403 (153.11)  5.5534 (27.61) 
-----------------------------------------------------------------------------------------
---------------- benchmark 'test_bench_atom_sorting_worst_case': 2 tests -----------------
Name (time in us) Mean StdDev
------------------------------------------------------------------------------------------
test_bench_atom_sorting_worst_case[pkgcraft-Atom]  6.2702 (1.0)  0.3301 (1.0) 
test_bench_atom_sorting_worst_case[pkgcore-atom]  924.1410 (147.39)  6.9942 (21.19) 
------------------------------------------------------------------------------------------

As seen above, pkgcraft is able to instantiate atom objects about 5-6x faster than pkgcore and about 10x faster than portage. For static atoms when using the cached implementation this increases to about 150x faster, meaning portage should look into using an LRU cache for directly created atom objects. With respect to pkgcore’s static result, it also appears to not use caching; however, it does support atom instance caching internally so the benchmark is avoiding that somehow.

When comparing sorting, pkgcraft is well over two orders of magnitude ahead of pkgcore and I imagine portage would fare even worse, but it doesn’t natively support atom object comparisons so isn’t included here.

Beyond processing time it’s often useful to track memory use, especially for languages such as python that are designed more for ease of development than memory efficiency. There are a number of different techniques to track memory use such as projects like guppy3 but they often work with native python objects, ignoring or misrepresenting allocations done in underlying implementations. Instead, pkgcraft includes a simple script that creates a list of a million objects for three different atom types while tracking elapsed time and overall memory use (using resident set size) in separate processes.

To run the memory benchmarks use:

1
$ tox -e membench

Which produces output similar to:

Static atoms (1000000)
----------------------------------------------
implementation memory (elapsed)
----------------------------------------------
pkgcraft 474.2 MB (0.94s)
pkgcraft-cached 8.7 MB (0.27s)
pkgcore 8.4 MB (1.12s)
portage 795.5 MB (10.62s)
Dynamic atoms (1000000)
----------------------------------------------
implementation memory (elapsed)
----------------------------------------------
pkgcraft 955.2 MB (2.93s)
pkgcraft-cached 957.9 MB (3.56s)
pkgcore 1.3 GB (31.01s)
portage 4.0 GB (56.22s)
Random atoms (1000000)
----------------------------------------------
implementation memory (elapsed)
----------------------------------------------
pkgcraft 945.4 MB (3.75s)
pkgcraft-cached 21.3 MB (1.30s)
pkgcore 20.9 MB (2.67s)
portage 3.6 GB (46.77s)

For static atoms, note that pkgcraft-cached and pkgcore’s memory usage is quite close with pkgcore slightly edging ahead due to the extra data pkgcraft stores to speed up comparisons. Another point of interest is that the uncached implementation still beats pkgcore in processing time. This is because the underlying rust implementation has its own cache allowing it to skip unnecessary parsing, leaving the majority of overhead from cython’s object instantiation. Portage is last by a large margin since it doesn’t directly cache atom objects.

Every dynamic atom is different making caching irrelevant so no implementation has a substantial memory usage edge. Without cache speedups, the uncached pkgcraft implementation is the fastest as it has the least overhead. Pkgcore’s memory usage is comparatively respectable, but uses about an order of magnitude more processing time for parsing and instantiation. Portage is again last by an increased margin and appears to perform inefficiently when storing more complex atoms.

Finally, random atoms try to model closer to what is found across the tree in terms of cache hits. As the results show, using cached implementations probably is a good idea for large sets of atoms with occasional overlap in order to save both processing time and memory usage; otherwise, both attributes suffer as seen from portage’s uncached implementation results.

Looking to the future

From the rough benchmarks above, it seems apparent both pkgcore and portage could decrease their overall processing time and/or memory usage by moving to using package atom support from pkgcraft python bindings. While I’m unsure how much of a performance difference it would make, it should at least be noticeably worthwhile when processing large amounts of data, e.g. scanning the entire tree with pkgcheck or sorting atoms during large dependency resolutions.

It’s also clear that using cython’s extension types and C support on top of rust code yield relatively sizeable wins over native python code. From my perspective, it seems worthwhile to implement all core functionality in a similar fashion for projects that last decades like portage already has. The downside of implementing support in a more difficult language should decrease the longer a project remains viable.

In terms of feasibility, it’s probably easier to inject the pkgcraft bindings into portage since its atom support subclasses string objects while pkgcore’s subclasses an internal restriction class, but both should be possible with some redesign. Realistically speaking, neither is likely to occur because both projects lack maintainers with the required combination of skill, time, and interest to perform the rework. In addition, currently doing so in a non-optional fashion would generally restrict projects to fewer targets due to rust’s lack of support for older architectures, but this downside may be somewhat resolved if a viable GCC rust implementation is released in the future.

Other than python, pkgcraft has more basic support available for go supporting package atom and version object interactions. As the core library gains more features, I’ll try to keep working on exposing the same functionality via bindings since I think initial interactions with pkgcraft may be easiest when leveraging it for data processing from scripting languages.

June 08 2022

Rustifying Bash

Tim Harder (pkgcraft) June 08, 2022, 19:08

In the previous post on extending bash, using builtins was mentioned as a way to improve extensibility. Rather than writing native bash functions or spawning processes to run external tools, pkgcraft implements all its bash command support using builtins.

For example, the inherit command used to load eclasses, die, and all install-related functionality (e.g. doins) are all implemented as builtins1. This allows for a more seamless experience compared to pkgcore which implements all of this natively in bash using a simple daemon that sends messages via shared fds to communicate between the python and bash sides.

Note that these builtins are not readily available for use in regular bash since most are highly Gentoo specific, often rely on underlying build state, and aren’t built in a fashion that can be externally exposed. However, the design work done to support bash in rust also allows creating builtins compatible with standard bash.

For those interested in bash and rust, the following walkthrough explains how dynamic builtins work, describes some of the rust support required for interoperability, and discusses why they’re useful.

Dynamic builtin basics

For background, bash includes builtins used daily by many, e.g. cd, echo, and source are all builtins. In addition to these, external builtins can be loaded dynamically from shared object files via enable:

1
$ enable -f path/to/shared/object.so builtin_name

which uses dlopen to open the shared object and dlsym to load the symbol for the related builtin, if it exists. The builtin is then registered internally as dynamically loaded and can be used until it is either unloaded or the shell exits.

To remove a previously loaded builtin use:

1
$ enable -d builtin_name

Running a builtin is the same as running most other commands:

1
$ builtin_name args to pass

In terms of default execution precedence, similarly named functions come first, then builtins, and finally external binaries. This means if an in scope function and loaded builtin have the same name, running that name in the shell will run the function and not the builtin.

The version of bash installed by most distros should support dynamic builtins inherently because bash itself doesn’t provide a disable mechanism; however, Gentoo manually hacks the configure script to disable support by default. In order to enable it, make sure to build bash with the plugins USE flag enabled.

Builtins have access to nearly all bash’s underlying API; however, they are mainly limited to running in command form using simple string arguments. In other words, scoped builtins that form more complex expressions, e.g. bash’s conditional expression [[ ]], generally require parser and/or grammar level changes that aren’t possible to achieve in a basic builtin.

Creating builtins in rust

One of the tricky parts supporting dynamic builtins in rust is that it has no support of life before main or lib init similar to C. Therefore, we must determine some way to provide external symbols for builtin structs that can’t be initialized globally before init. To do this rust relies on linker support for runtime initialization via DT_INIT_ARRAY for ELF objects (and similar on other platforms). This allows running a specified function during the library loading process that replaces Option wrapped, globally defined, static mutables with their actual builtin structs required by bash2.

Beyond building the shared objects, pkgcraft provides support for interacting with bash’s C API in rust via scallop. This enables performing most anything that can be done natively in bash. For example, bash variables can be bound, unbound, and marked as readonly. However, it should be noted that scallop is a young project so it only supports what pkgcraft has needed thus far, has many rough edges, and doesn’t come close to wrapping all of bash’s exported API.

In addition to scallop, pkgcraft also provides pkgcraft-bash which is mainly an example project to create dynamic builtins. For our purposes, we’ll be exploring scallop and pkgcraft-bash while using them to demonstrate how rust-based builtins work.

Development environment

First, the required tools for building the code should be installed. This includes a recent rust compiler, C compiler, and a recent version of bash that supports loading dynamic builtins from shared objects. I leave it up to the reader to leverage rustup and/or their distro’s package manager to install the required build tools (and others such as git that are implied requirements).

Next, the required pkgcraft subprojects must be pulled down. The easiest way to do this is to recursively clone pkgcraft-workspace which should include semi-recent submodule checkouts for all the subprojects:

1
2
git clone --recurse-submodules github.com/pkgcraft/pkgcraft-workspace.git
cd pkgcraft-workspace

From this workspace, the pkgcraft-bash project can be built via:

1
$ cargo build -p pkgcraft-bash --features pkgcraft

This should create the shared pkgcraft-bash library target/debug/libpkgcraft_bash.so from which dynamic builtins can be loaded.

Profiling

In order to aid in bash development with rust, scallop provides a rudimentary profiling builtin. To load and use it, see the following example:

1
2
3
4
$ enable -f target/debug/libpkgcraft_bash.so profile
$ profile sleep 1
profiling: sleep 1
elapsed 3.005011736s, loops: 3, per loop: 1.001670578s

In short, it profiles a user-specified command over a period of time while counting loops completed. This could be extended to run cache warmups and perform more accurate statistical analysis, but its current form works for simple benchmarking.

It’s quite fair to say that if you start benchmarking bash code then you probably shouldn’t be using bash; however, most Gentoo package managers include a relatively large amount of bash that should be optimized in cases where it runs often or in tight loops.

Pkgcraft leverages scallop to sidestep this entirely, allowing all native bash code required to support operating with ebuilds to be replaced with rust. Alongside that, this profile builtin helps highlight certain types of runtime regressions in pkgcraft’s builtin support.

Atom version comparisons

Now that you have some experience with the profile builtin, let’s compare the performance of an actual rust-based builtin to similar functionality written natively in bash for atom version comparisons.

First, download a copy of eapi7-ver.eclass that contains the bash implementation of the version comparison algorithm used in Gentoo for the ver_test command in portage and pkgcore.

1
$ wget raw.githubusercontent.com/gentoo/gentoo/master/eclass/eapi7-ver.eclass

Next, check its performance using the profile builtin. Note that if you started a new bash shell, the profile builtin will have to be reloaded.

1
2
3
4
5
$ source eapi7-ver.eclass
$ enable -f target/debug/libpkgcraft_bash.so profile
$ profile ver_test 1.2.3_alpha1-r1 -gt 1.2.3_alpha1-r2
profiling: ver_test 1.2.3_alpha1-r1 -gt 1.2.3_alpha1-r2
elapsed 3.000030955s, loops: 10648, per loop: 281.745µs

With that baseline established for the native bash implementation, let’s create a new builtin that wraps pkgcraft support to provide the same functionality. It’s probably easiest to copy pkgcraft’s ver_test builtin into pkgcraft-bash with minor alterations in order to make it dynamically loadable.

Use the following diff that currently applies against the pkgcraft-bash repo to include ver_test support (or use it as a guide if it has fallen out of date).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
diff --git a/src/builtins.rs b/src/builtins.rs
index 83a6e63..7055281 100644
--- a/src/builtins.rs
+++ b/src/builtins.rs
@@ -1,6 +1,7 @@
 use scallop::builtins::DynBuiltin;

 mod atom;
+mod ver_test;

 #[export_name = "profile_struct"]
 static mut PROFILE_STRUCT: Option<DynBuiltin> = None;
@@ -22,9 +23,10 @@ pub(super) extern "C" fn initialize() {
 // update struct pointers
 unsafe {
 atom::ATOM_STRUCT = Some(atom::BUILTIN.into());
+ ver_test::VER_TEST_STRUCT = Some(ver_test::BUILTIN.into());
 }

 // add builtins to known run() mapping
- update_run_map([&atom::BUILTIN]);
+ update_run_map([&atom::BUILTIN, &ver_test::BUILTIN]);
 }
 }
diff --git a/src/builtins/ver_test.rs b/src/builtins/ver_test.rs
new file mode 100644
index 0000000..24dfecc
--- /dev/null
+++ b/src/builtins/ver_test.rs
@@ -0,0 +1,46 @@
+#![cfg(feature = "pkgcraft")]
+use std::str::FromStr;
+
+use pkgcraft::atom::Version;
+use scallop::builtins::{Builtin, ExecStatus, DynBuiltin};
+use scallop::variables::string_value;
+use scallop::{Error, Result};
+
+const LONG_DOC: &str = "Perform comparisons on package version strings.";
+
+#[doc = stringify!(LONG_DOC)]
+pub(crate) fn run(args: &[&str]) -> Result<ExecStatus> {
+ let pvr = string_value("PVR").unwrap_or_else(|| String::from(""));
+ let pvr = pvr.as_str();
+ let (v1, op, v2) = match args.len() {
+ 2 if pvr.is_empty() => return Err(Error::Builtin("$PVR is undefined".into())),
+ 2 => (pvr, args[0], args[1]),
+ 3 => (args[0], args[1], args[2]),
+ n => return Err(Error::Builtin(format!("only accepts 2 or 3 args, got {n}"))),
+ };
+
+ let v1 = Version::from_str(v1)?;
+ let v2 = Version::from_str(v2)?;
+
+ let ret = match op {
+ "-eq" => v1 == v2,
+ "-ne" => v1 != v2,
+ "-lt" => v1 < v2,
+ "-gt" => v1 > v2,
+ "-le" => v1 <= v2,
+ "-ge" => v1 >= v2,
+ _ => return Err(Error::Builtin(format!("invalid operator: {op}"))),
+ };
+
+ Ok(ExecStatus::from(ret))
+}
+
+#[export_name = "ver_test_struct"]
+pub(super) static mut VER_TEST_STRUCT: Option<DynBuiltin> = None;
+
+pub(super) static BUILTIN: Builtin = Builtin {
+ name: "ver_test",
+ func: run,
+ help: LONG_DOC,
+ usage: "ver_test 1 -lt 2-r1",
+};
--
2.35.1

Once the diff is applied, rebuild pkgcraft-bash with pkgcraft support enabled from the root of the workspace which will currently build the profile, atom, and ver_test builtins.

1
$ cargo build -p pkgcraft-bash --features pkgcraft

Now, profile ver_test again making sure to use the builtin implementation.

1
2
3
4
5
$ unset -f ver_test
$ enable -f target/debug/libpkgcraft_bash.so profile ver_test
$ profile ver_test 1.2.3_alpha1-r1 -gt 1.2.3_alpha1-r2
profiling: ver_test 1.2.3_alpha1-r1 -gt 1.2.3_alpha1-r2
elapsed 3.000010097s, loops: 252482, per loop: 11.882µs

From the result, note that the rust implementation is over 20x faster than the native bash version. Through further work this can potentially be improved with more changes to bash’s builtin support. For example, bash currently does a binary search in its builtins array to find if a matching builtin exists before executing it. This should be quicker to perform as a simple hash table lookup instead.

Rust or bash, your call

Overall, I personally find most programming languages to be more maintainable than bash in the long-term for any well-written code longer than relatively simple scripts. Add in rust’s ability to be exported via its FFI interface to any language that has C interoperability and it should become apparent why I prefer implementing such support in rust rather than bash.

If scallop keeps improving its wrapper API around bash, support for writing bash functionality in rust should continue to improve as well. Looking forward, it’s feasible something like bats could be written in rust or scallop’s functionality could be exported to another language, for example allowing python to natively interact with bash.

For the time being, I’ll just continue using it for one of the main reasons I created it: trying to avoid writing extensive code in bash.

  1. They can currently be found in the pkgsh/builtins subdirectory of the pkgcraft crate. ↩︎

  2. The ctor crate can make this easier via procedural macros. ↩︎

In the previous post on extending bash, using builtins was mentioned as a way to improve extensibility. Rather than writing native bash functions or spawning processes to run external tools, pkgcraft implements all its bash command support using builtins.

For example, the inherit command used to load eclasses, die, and all install-related functionality (e.g. doins) are all implemented as builtins1. This allows for a more seamless experience compared to pkgcore which implements all of this natively in bash using a simple daemon that sends messages via shared fds to communicate between the python and bash sides.

Note that these builtins are not readily available for use in regular bash since most are highly Gentoo specific, often rely on underlying build state, and aren’t built in a fashion that can be externally exposed. However, the design work done to support bash in rust also allows creating builtins compatible with standard bash.

For those interested in bash and rust, the following walkthrough explains how dynamic builtins work, describes some of the rust support required for interoperability, and discusses why they’re useful.

Dynamic builtin basics

For background, bash includes builtins used daily by many, e.g. cd, echo, and source are all builtins. In addition to these, external builtins can be loaded dynamically from shared object files via enable:

1
$ enable -f path/to/shared/object.so builtin_name

which uses dlopen to open the shared object and dlsym to load the symbol for the related builtin, if it exists. The builtin is then registered internally as dynamically loaded and can be used until it is either unloaded or the shell exits.

To remove a previously loaded builtin use:

1
$ enable -d builtin_name

Running a builtin is the same as running most other commands:

1
$ builtin_name args to pass

In terms of default execution precedence, similarly named functions come first, then builtins, and finally external binaries. This means if an in scope function and loaded builtin have the same name, running that name in the shell will run the function and not the builtin.

The version of bash installed by most distros should support dynamic builtins inherently because bash itself doesn’t provide a disable mechanism; however, Gentoo manually hacks the configure script to disable support by default. In order to enable it, make sure to build bash with the plugins USE flag enabled.

Builtins have access to nearly all bash’s underlying API; however, they are mainly limited to running in command form using simple string arguments. In other words, scoped builtins that form more complex expressions, e.g. bash’s conditional expression [[ ]], generally require parser and/or grammar level changes that aren’t possible to achieve in a basic builtin.

Creating builtins in rust

One of the tricky parts supporting dynamic builtins in rust is that it has no support of life before main or lib init similar to C. Therefore, we must determine some way to provide external symbols for builtin structs that can’t be initialized globally before init. To do this rust relies on linker support for runtime initialization via DT_INIT_ARRAY for ELF objects (and similar on other platforms). This allows running a specified function during the library loading process that replaces Option wrapped, globally defined, static mutables with their actual builtin structs required by bash2.

Beyond building the shared objects, pkgcraft provides support for interacting with bash’s C API in rust via scallop. This enables performing most anything that can be done natively in bash. For example, bash variables can be bound, unbound, and marked as readonly. However, it should be noted that scallop is a young project so it only supports what pkgcraft has needed thus far, has many rough edges, and doesn’t come close to wrapping all of bash’s exported API.

In addition to scallop, pkgcraft also provides pkgcraft-bash which is mainly an example project to create dynamic builtins. For our purposes, we’ll be exploring scallop and pkgcraft-bash while using them to demonstrate how rust-based builtins work.

Development environment

First, the required tools for building the code should be installed. This includes a recent rust compiler, C compiler, and a recent version of bash that supports loading dynamic builtins from shared objects. I leave it up to the reader to leverage rustup and/or their distro’s package manager to install the required build tools (and others such as git that are implied requirements).

Next, the required pkgcraft subprojects must be pulled down. The easiest way to do this is to recursively clone pkgcraft-workspace which should include semi-recent submodule checkouts for all the subprojects:

1
2
git clone --recurse-submodules https://github.com/pkgcraft/pkgcraft-workspace.git
cd pkgcraft-workspace

From this workspace, the pkgcraft-bash project can be built via:

1
$ cargo build -p pkgcraft-bash --features pkgcraft

This should create the shared pkgcraft-bash library target/debug/libpkgcraft_bash.so from which dynamic builtins can be loaded.

Profiling

In order to aid in bash development with rust, scallop provides a rudimentary profiling builtin. To load and use it, see the following example:

1
2
3
4
$ enable -f target/debug/libpkgcraft_bash.so profile
$ profile sleep 1
profiling: sleep 1
elapsed 3.005011736s, loops: 3, per loop: 1.001670578s

In short, it profiles a user-specified command over a period of time while counting loops completed. This could be extended to run cache warmups and perform more accurate statistical analysis, but its current form works for simple benchmarking.

It’s quite fair to say that if you start benchmarking bash code then you probably shouldn’t be using bash; however, most Gentoo package managers include a relatively large amount of bash that should be optimized in cases where it runs often or in tight loops.

Pkgcraft leverages scallop to sidestep this entirely, allowing all native bash code required to support operating with ebuilds to be replaced with rust. Alongside that, this profile builtin helps highlight certain types of runtime regressions in pkgcraft’s builtin support.

Atom version comparisons

Now that you have some experience with the profile builtin, let’s compare the performance of an actual rust-based builtin to similar functionality written natively in bash for atom version comparisons.

First, download a copy of eapi7-ver.eclass that contains the bash implementation of the version comparison algorithm used in Gentoo for the ver_test command in portage and pkgcore.

1
$ wget https://raw.githubusercontent.com/gentoo/gentoo/master/eclass/eapi7-ver.eclass

Next, check its performance using the profile builtin. Note that if you started a new bash shell, the profile builtin will have to be reloaded.

1
2
3
4
5
$ source eapi7-ver.eclass
$ enable -f target/debug/libpkgcraft_bash.so profile
$ profile ver_test 1.2.3_alpha1-r1 -gt 1.2.3_alpha1-r2
profiling: ver_test 1.2.3_alpha1-r1 -gt 1.2.3_alpha1-r2
elapsed 3.000030955s, loops: 10648, per loop: 281.745µs

With that baseline established for the native bash implementation, let’s create a new builtin that wraps pkgcraft support to provide the same functionality. It’s probably easiest to copy pkgcraft’s ver_test builtin into pkgcraft-bash with minor alterations in order to make it dynamically loadable.

Use the following diff that currently applies against the pkgcraft-bash repo to include ver_test support (or use it as a guide if it has fallen out of date).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
diff --git a/src/builtins.rs b/src/builtins.rs
index 83a6e63..7055281 100644
--- a/src/builtins.rs
+++ b/src/builtins.rs
@@ -1,6 +1,7 @@
 use scallop::builtins::DynBuiltin;

 mod atom;
+mod ver_test;

 #[export_name = "profile_struct"]
 static mut PROFILE_STRUCT: Option<DynBuiltin> = None;
@@ -22,9 +23,10 @@ pub(super) extern "C" fn initialize() {
 // update struct pointers
 unsafe {
 atom::ATOM_STRUCT = Some(atom::BUILTIN.into());
+ ver_test::VER_TEST_STRUCT = Some(ver_test::BUILTIN.into());
 }

 // add builtins to known run() mapping
- update_run_map([&atom::BUILTIN]);
+ update_run_map([&atom::BUILTIN, &ver_test::BUILTIN]);
 }
 }
diff --git a/src/builtins/ver_test.rs b/src/builtins/ver_test.rs
new file mode 100644
index 0000000..24dfecc
--- /dev/null
+++ b/src/builtins/ver_test.rs
@@ -0,0 +1,46 @@
+#![cfg(feature = "pkgcraft")]
+use std::str::FromStr;
+
+use pkgcraft::atom::Version;
+use scallop::builtins::{Builtin, ExecStatus, DynBuiltin};
+use scallop::variables::string_value;
+use scallop::{Error, Result};
+
+const LONG_DOC: &str = "Perform comparisons on package version strings.";
+
+#[doc = stringify!(LONG_DOC)]
+pub(crate) fn run(args: &[&str]) -> Result<ExecStatus> {
+ let pvr = string_value("PVR").unwrap_or_else(|| String::from(""));
+ let pvr = pvr.as_str();
+ let (v1, op, v2) = match args.len() {
+ 2 if pvr.is_empty() => return Err(Error::Builtin("$PVR is undefined".into())),
+ 2 => (pvr, args[0], args[1]),
+ 3 => (args[0], args[1], args[2]),
+ n => return Err(Error::Builtin(format!("only accepts 2 or 3 args, got {n}"))),
+ };
+
+ let v1 = Version::from_str(v1)?;
+ let v2 = Version::from_str(v2)?;
+
+ let ret = match op {
+ "-eq" => v1 == v2,
+ "-ne" => v1 != v2,
+ "-lt" => v1 < v2,
+ "-gt" => v1 > v2,
+ "-le" => v1 <= v2,
+ "-ge" => v1 >= v2,
+ _ => return Err(Error::Builtin(format!("invalid operator: {op}"))),
+ };
+
+ Ok(ExecStatus::from(ret))
+}
+
+#[export_name = "ver_test_struct"]
+pub(super) static mut VER_TEST_STRUCT: Option<DynBuiltin> = None;
+
+pub(super) static BUILTIN: Builtin = Builtin {
+ name: "ver_test",
+ func: run,
+ help: LONG_DOC,
+ usage: "ver_test 1 -lt 2-r1",
+};
--
2.35.1

Once the diff is applied, rebuild pkgcraft-bash with pkgcraft support enabled from the root of the workspace which will currently build the profile, atom, and ver_test builtins.

1
$ cargo build -p pkgcraft-bash --features pkgcraft

Now, profile ver_test again making sure to use the builtin implementation.

1
2
3
4
5
$ unset -f ver_test
$ enable -f target/debug/libpkgcraft_bash.so profile ver_test
$ profile ver_test 1.2.3_alpha1-r1 -gt 1.2.3_alpha1-r2
profiling: ver_test 1.2.3_alpha1-r1 -gt 1.2.3_alpha1-r2
elapsed 3.000010097s, loops: 252482, per loop: 11.882µs

From the result, note that the rust implementation is over 20x faster than the native bash version. Through further work this can potentially be improved with more changes to bash’s builtin support. For example, bash currently does a binary search in its builtins array to find if a matching builtin exists before executing it. This should be quicker to perform as a simple hash table lookup instead.

Rust or bash, your call

Overall, I personally find most programming languages to be more maintainable than bash in the long-term for any well-written code longer than relatively simple scripts. Add in rust’s ability to be exported via its FFI interface to any language that has C interoperability and it should become apparent why I prefer implementing such support in rust rather than bash.

If scallop keeps improving its wrapper API around bash, support for writing bash functionality in rust should continue to improve as well. Looking forward, it’s feasible something like bats could be written in rust or scallop’s functionality could be exported to another language, for example allowing python to natively interact with bash.

For the time being, I’ll just continue using it for one of the main reasons I created it: trying to avoid writing extensive code in bash.


  1. They can currently be found in the pkgsh/builtins subdirectory of the pkgcraft crate↩︎

  2. The ctor crate can make this easier via procedural macros. ↩︎

VIEW

SCOPE

FILTER
  from
  to