Looks like a good project to learn container from scratch. Just wondering the ma...

cyphar · on Aug 7, 2023

As the maintainer of a Go container runtime (runc), and having worked with Rust in various other projects, while they can be better languages for building large projects, they make it harder to understand what exactly your program is doing when writing software like this.

One example that immediately comes to mind from Rust is a bug with O_PATH file descriptors I found a while ago[1], which would've made certain code we use in runc not work. And from Go, here is a bug I just found in their code for handling file descriptors for ForkExec[2] which is causing issues in a runc patch I'm working on. Neither of these issues exist in C programs. Though of course, C programs have their own issues. For better or worse, the Linux kernel APIs are easiest to use from C.

In runc we actually implement the core container setup code in C because Go doesn't allow you to do everything we need for setting up a container (it has gotten better though, in the past it was completely impossible to set up a container properly in pure Go -- now you can set one up but there are still certain configurations that are not possible to implement in pure Go, such as "docker exec"). You also cannot run Go in single-threaded mode, which means that certain kernel APIs (unshare(CLONE_NEWUSER) for instance) simply cannot be used from regular Go code.

[1]: https://github.com/rust-lang/rust/issues/62314 [2]: https://github.com/golang/go/issues/61751

kouteiheika · on Aug 7, 2023

> One example that immediately comes to mind from Rust is a bug with O_PATH file descriptors I found a while ago[1], which would've made certain code we use in runc not work. [...] Neither of these issues exist in C programs.

This issue doesn't intrinsically affect Rust as a language (when compared to C), because you can just do exactly the same thing as you'd have done in C:

    let fd = libc::open(b"/path\0".as_ptr().cast(), libc::libc::O_CLOEXEC | libc::O_PATH);

Or just make the syscall directly-ish:

    let fd = libc::syscall(libc::SYS_open, b"/path\0".as_ptr(), libc::O_CLOEXEC | libc::O_PATH);

Or use rustix if you want more convenient idiomatic wrappers.

And for setting up containers you'll have to do this anyway because Rust's standard library doesn't expose all of the necessary functionality anyway.

cyphar · on Aug 7, 2023

I'm aware you can work around it, there are workarounds for issues in Go as well.

In general, C programs do not require workarounds for dealing with kernel APIs for the simple reason that the vast majority of kernel APIs are developed with test programs written in C, so kernel developers will usually not design an API that is awful to use in C.

Another thing that surprised me when I first started programming in Rust is that:

    let fd = File::open("foo")?.as_raw_fd();

and

    let f = File::open("foo")?;
    let fd = f.as_raw_fd();

have different behaviour, with the former being incorrect and a possible security bug if you use the file descriptor directly afterwards. But I guess this behaviour is obvious to seasoned Rust developer (at least, it seems obvious to me now).

kouteiheika · on Aug 8, 2023

It's not a workaround - the `File` in Rust wasn't meant nor designed to support full `open` semantics. If you want to use `open` you should use `open` (or an idiomatic wrapper which is meant to model that) instead of forcing it through `File`.

And `open` is not a kernel API either. It's a libc API. If you want to directly access the API provided by the kernel you're supposed to make a syscall, which essentially is exactly the same in Rust and in C.

And to make the point of `open` in C *not* being a kernel API more clear, in glibc the `open` function *doesn't* actually call the `open` syscall, but `openat` with `AT_FDCWD`. Glibc doesn't guarantee that a given function will actually call a given syscall, and new versions of glibc often change which syscalls are called by a given function. This is important if you're also doing e.g. seccomp sandboxing, because suddenly your program might stop working if glibc is updated. For example, glibc 2.34 started using the `clone3` syscall under the hood, which broke Chromium Embedded Framework's sandbox.

So, again, your argument that a language like Rust "makes it harder to understand what exactly your program is doing" compared to C in this particular case isn't really valid, because C has exactly the same problem if you use libc functions, and the only way to guarantee that the program is doing exactly what you want is to use syscalls, which is the same both in C and Rust.

> Another thing that surprised me when I first started programming in Rust

Yep. That's one of the Rust's few badly designed APIs.

LegionMammal978 · on Aug 8, 2023

As I understand it, that as_raw_fd() issue is a big reason that they added the BorrowedFd<'_> type [0] and corresponding AsFd trait in 1.63.0, to prevent the raw file descriptor from outliving its logical owner. Still, I agree that there is lots of potential for issues on the boundary between Rust's implicit lifetime management vs. C APIs' explicit lifetime management, since there won't always be a convenient preexisting mechanism to bridge the gap.

[0] https://doc.rust-lang.org/std/os/fd/struct.BorrowedFd.html

rirze · on Aug 7, 2023

Where can I learn more about this? This seems very uninutive to me, as I'm learning Rust myself.

cyphar · on Aug 7, 2023

I'm not sure if there is a document that mentions this in particular, but it is a consequence of how lifetimes work. The core issue is that .as_raw_fd() takes &File and returns an integer (which doesn't have lifetime information). As a result, the File is dropped at the end of the statement and thus the number you got from .as_raw_fd() is invalidated.

This does happen elsewhere in Rust, but often when you have methods on &self that return something you use later, the method returns something with the same lifetime (fn foo(&'a self) -> Foo<&'a>) and thus the original object will be kept alive until the end of the scope. It just so happens that file descriptors are tied to the lifetime of the File in a way that Rust cannot express nor detect.

I don't know if clippy has a warning for this particular case. It might be useful to add it.

pjmlp · on Aug 7, 2023

Those Rust and Go bugs aren't much different from C gotchas when writing portable UNIX code.

lelanthran · on Aug 7, 2023

> Those Rust and Go bugs aren't much different from C gotchas when writing portable UNIX code.

Maybe, but:

1. It's irrelevant to this product (no one is writing portable UNIX code when they are writing some Linux-specific software, like container implementations).

and

2. It's irrelevant to the author's goals (learning Linux kernel stuff using the language that the interface to the kernel uses is a better idea than using a different language and hacking shims for all the stuff you want to do).

and

3. The cost to switch to a new language is substantial, and only makes sense if you're either joining a team and project that uses that new language, or if the goal is to learn that new language.

pjmlp · on Aug 7, 2023

1. Containers predate Linux, appeared in other UNIXes before GNU/Linux, and Windows also has them.

2. Any languge able to call into Linux API surface is usable. And if we go down the UNIX native languages route born at Bell Labs, C++ also counts.

3. That depends on how much someone knows C (properly), versus other alternatives

aragilar · on Aug 7, 2023

It depends what you mean by "container". As far as I know, Windows containers aren't using namespaces, cgroups and seccomp. BSD Jails are definitely a different thing. So if you wanted to know how exactly linux containers worked, it's probably easiest to use what the linux docs provide (which is C).

pjmlp · on Aug 7, 2023

Of course Windows containers aren't using Linux APIs, they are using Win32 mechanisms for process sandboxing.

Just like HP-UX vaults, and Solaris Zones aren't using namespaces, cgroups and seccomp, rather their own UNIX flavours.

bheadmaster · on Aug 7, 2023

Wouldn't "runtime.LockOSThead()" help with the single-threaded API?

cyphar · on Aug 7, 2023

No, that only pins the current goroutine to a single OS thread (which is needed for some APIs -- namely, all of the other namespace APIs and some thread-related APIs).

There is no way to make an entire Go program run as a single threaded program without using CGo the way we do in runc. Even GOMAXPROCS=1 doesn't work. CLONE_NEWUSER will always fail in a multi-threaded program.

serf · on Aug 7, 2023

I can't answer for the developer, but the answer to that with most small one-person-show projects is familiarity/comfort/ability.

the head-space that adopting a new language for a specific project takes is immense compared to tackling it in a familiar language that you know you're already able in; there is rarely a benefit to doing so outside of team environments where a certain level of on-boarding is expected, or because you have a really niche language requirement/feature that your project is begging for.

nazgulsenpai · on Aug 7, 2023

I came across this last week when reading about different container runtimes -- crun is implemented in C[0].

Their explanation:

  "While most of the tools used in the Linux containers ecosystem are written in Go, I believe C is a better fit for a lower level tool like a container runtime. runc, the most used implementation of the OCI runtime specs written in Go, re-execs itself and use a module written in C for setting up the environment before the container process starts.

  crun aims to be also usable as a library that can be easily included in programs without requiring an external process for managing OCI containers."

[0]https://github.com/containers/crun

lucavallin · on Aug 7, 2023

I haven't written much C since college and I felt nostalgic, so I went for it.