Fivefold Slower Compared to Go? Optimizing Rust's Protobuf Decoding Performance

MrBuddyCasino · on April 12, 2024

So in this case, with some modest performance engineering, Golang is surprisingly fast out of the box, and Rust requires more effort and increasingly less idiomatic code to reach the same result.

Is this typical?

sapiogram · on April 12, 2024

I haven't dug into the code, but it's clear that the Golang library has had a ton of optimization work put into it by very knowledgeable people. Techniques like object pooling are highly error prone, and certainly not "out of the box" Golang.

The Rust code and the blog post, on the other hand, seem to be written by someone less familiar with Rust and high-performance parsing. I think they would have avoided all their problems if they just used lifetimes to safely avoid copying from the start, instead of relying on increasingly elaborate workarounds like the `Bytes` crate. Apart from one "forces one to deal with the contagion of lifetimes" comment in the conclusion, they never mention why they didn't do this, even though it's clearly the idiomatic Rust solution. Maybe they had technical reasons for not doing lifetimes, but to me it just seems like unfamiliarity with Rust.

lossolo · on April 12, 2024

> Techniques like object pooling are highly error prone, and certainly not "out of the box" Golang.

This one is actually out of the box in Golang, it's called sync.Pool and is accessible in the standard library. It's very easy to use and not error prone, I've used it many times without any issues.

But creator of VictoriaMetrics is indeed someone very knowledgeable and known in Golang community in context of optimization.

formerly_proven · on April 12, 2024

I’m unfamiliar with all of the libraries used here, however, the solutions from the blog post struck me as quite odd. serde::Deserializer solves a similar problem and makes it quite straightforward and safe to borrow from the input, if possible. Obviously, the input needs to outlive the borrowed values.

littlestymaar · on April 12, 2024

> Apart from one "forces one to deal with the contagion of lifetimes" comment in the conclusion

I guess that must be the logical consequence of the “async function are contagious” meme… I wonder if at some point we end up with people arguing that dynamic typing is obviously better because it avoids “contagion with types”.

kosherhurricane · on April 13, 2024

More or less.

For a language designed for fresh-out-of-college engineer to pick up in a few weeks and be effective, it is very easy to squeeze out a lot of performance.

* built-in profiler. * built-in escape analysis tool. * It's easy to pass pointers instead of copying data. * []byte is sub-slice-able, with a backing array. This does throw people off occasionally, but the trade off is performance. * Go lets you have real arrays of structs, optimizing cpu caches. * Built-in memory pools

And more.

And if you look at "non-idiomatic" performance code, they are surprisingly legible by the said fresh engineer. It's as if the designers didn't want to give up all the usual C performance tricks while making a Java/Python kind of friendly language, and this shows.

Of course Go can go only so far, due to the built-in runtime and GC. But it gets very far. Much farther than at first glance, or second glances that language snobs would give credit for.

wokwokwok · on April 12, 2024

While it’s a nice bed time story if you like go, the reality is no, it’s not typical.

Go has a good perf story, but typically rust or c++ would be faster after heavy optimisation; and should be more or less on par with typical applications. This isn’t a critique of go, and shouldn’t surprise anyone.

Typically go also has unexpected optimisation hoops to jump through and problems related to the heavy use of channels (see the well documented answer here: https://stackoverflow.com/questions/47312029/when-should-you...), so you would generally expect it to be slower…

…but, naive implementations are always slower, and really, it’s probably much of a muchness out the box for most day to day uses.

In almost all situations (even python or Java) you can get great performance if you invest time and effort in it.

But idiomatic code typically faster than rust? No, not really.

kosherhurricane · on April 13, 2024

It's really easy to improve the performance of Go implementation as compared to python or Java. There are lots of built-in tools to help you (like the profiler), and the resulting code is really very legible even to fresh college grads.

This is based on my first hand experience, but YMMV.

pjmlp · on April 12, 2024

While I am not a big Go fan, most people validating Go's performance mix language and implementation, and forget gccgo also exists, sharing many optimizations offered by GCC's backend.

Unfortunely, it seems stuck in Go 1.18, the last pre-generics version, with no roadmap for moving forward.

Given Go's folks stance on generics, still a lot of Go code is compilable with gccgo.

riku_iki · on April 12, 2024

> but typically rust or c++ would be faster after heavy optimisation;

it is concerning how much digging was required to optimize rust code in this case.

sapiogram · on April 12, 2024

The Go code is already hyper-optimized by experts, just not by the blog authors, so you don't read about it here. As someone who has tried to write high-performance Go code on occasion, I can assure you that a ton of digging would have been required on that side as well.

kjksf · on April 12, 2024

To bring this discussion back to earth, the Go code in question is: https://github.com/VictoriaMetrics/easyproto/blob/master/rea...

I don't see why did you label Aliaksandr Valialkin, the author, an "expert". I mean, he's no dummy but what exactly makes him an expert on optimizing Go code?

As someone who also writes Go, I don't see any "hyper optimizations" in the code. It just decodes the bytes of Protocol Buffer using straightforward code that I would expect a competent developer to write.

It really is just: read bytes from memory ([]byte) and interpret them according to PB spec.

There's only one trick there: unsafeBytesToString() that does no-allocation conversion of []byte to string. This is unsafe in general but safe in their specific case. And I've seen this trick before so it's not some secret, expert-only knowledge.

Most comments here are like bad LLMs: hallucinating opinions without bothering to spent even few minutes acquiring the data to base those opinions on.

kasey_junk · on April 12, 2024

He’s probably most well known in the golang community for fasthttp which is a widely used and highly optimized golang replacement for the golang stdlib. I’m a long term golang developer and I think calling him a golang optimization “expert” is fair.

That said, I agree with your assessment about this particular code. It’s fairly straightforward idiomatic go.

sapiogram · on April 12, 2024

> I don't see why did you label Aliaksandr Valialkin, the author, an "expert". I mean, he's no dummy but what exactly makes him an expert on optimizing Go code?

I was trying to convey the meaning of "far more experienced than the blog post authors", but without having to insult the authors. It's a good writeup after all, and I'm glad they took the time.

We must have some different interpretations of what "optimized" means. This is the very first piece of code in the file you linked:

  func (fc *FieldContext) NextField(src []byte) ([]byte, error) {
      if len(src) >= 2 {
        n := uint16(src[0])<<8 | uint16(src[1])
        if (n&0x8080 == 0) && (n&0x0700 == (uint16(wireTypeLen) << 8)) {
            // Fast path - read message with the length smaller than 0x80 bytes.
            msgLen := int(n & 0xff)
            src = src[2:]
            if len(src) < msgLen {
                return src, fmt.Errorf("cannot read field for from %d bytes; need at least %d bytes", len(src), msgLen)
            }
            fc.FieldNum = uint32(n >> (8 + 3))
            fc.wireType = wireTypeLen
            fc.data = src[:msgLen]
            src = src[msgLen:]
            return src, nil
        }
    }
    // ... function continues beyond this point

As far as I can tell, this entire codepath exists solely as an optimization. I spent many years working on a chess engine for fun, so I'm pretty well versed in bit twiddling, but I'm seriously struggling with this. Like, is it doing `(n&0x8080 == 0)` to check to whether length is less than 0x80? Is that even correct?

I think "hyper optimized" is a completely fair characterization. But we clearly work in different industries.

ndriscoll · on April 13, 2024

Protobuf uses a bunch of variable length encodings. Here it's decoding a TLV format, but the length is itself a variable length integer (and seems to be a kind of tag-value encoding?) where you basically get 7 bits per byte telling you the value, and the leftmost bit tells you whether there's another byte. So if you mask with 0x8080 and get zero, then it was a 1 byte (7 bit) integer.

If 0x8080 is not set, then the tag-value record is 2 bytes. Left byte has tag. Right is value. Then they're masking with 0x0700 to get the type of record, which should be LEN.

So if it's a single byte LEN record, they can take that single byte as the length (they mask with 0x00ff, but really it's 0x007f. They already know the 0x80 bit is zero, and the value is contained in the least significant 7 bits). Otherwise they have to do some fiddly logic to decode the variable length integer to figure out the length (length here being the L in TLV).

tedunangst · on April 12, 2024

I'm not sure the presence of bit unpacking code in a decoder for a bit packed protocol is sufficient to call it hyper optimized. That seems like the nature of the problem.

sgift · on April 12, 2024

> I don't see why did you label Aliaksandr Valialkin, the author, an "expert". I mean, he's no dummy but what exactly makes him an expert on optimizing Go code?

https://victoriametrics.com/team/ - Let's see. Author of multiple performance-optimized libraries with a masters degree in computer software engineering and a background in highly scalable systems (as needed for adtech). Sounds pretty much like an expert for optimizing code to me.

> And I've seen this trick before so it's not some secret, expert-only knowledge.

So, if you know it it's not export knowledge or what's your argument here?

fredrikholm · on April 12, 2024

Whilst still being 1/3rd slower than the Go version.

Filligree · on April 12, 2024

They could have used lifetimes. Instead they used elaborate workarounds, which turned out to be slower.

pigpang · on April 12, 2024

Rust-protobuf is much slower than quick-protobuf or prost.

v0y4g3r · on April 12, 2024

The Go version is not a naive implementation, it's also the well optimized version from VictoriaMetrics.

pigpang · on April 12, 2024

Idiomatic quick-protobuf is much faster than non-idiomatic rust-protobuf: https://github.com/tafia/quick-protobuf/tree/master/perftest

v0y4g3r · on April 12, 2024

I think no. In this particular case it involves how Rust handles lifetimes compared to other languages with garbage collector.

Essentially it's a comparison between reference counting and tracing garbage collection based on reacheability.

MrBuddyCasino · on April 12, 2024

Yes, and this is what I was curious about, because I haven't seen the performance cost of this discussed a lot.

Worded differently: does the strict concept of ownership/lifestimes in Rust bias a default (naive) implementation towards lower performance (eg due to required copying) when compared to a naive Golang (or even Java) implementation?

I have no doubts that after heavy optimization, Rust beats languages such as Go & Java.

pflanze · on April 12, 2024

Using clone or Arc with boxing everywhere to avoid using references with lifetimes at all will lead to code that's slower than Go/Java, yes, unless you're just cloning small objects that don't internally use heap allocations or your algorithm dictates to only need moves, not sharing, or perhaps except it will likely use less RAM which in some situations may still make it faster. But such "newbie" code will probably still be using some other existing code that is using references internally, which makes things faster for those parts. Also, the difficulty of the use of references varies depending on how long / indirect they are going to be used, in many places references are easy to deal with even for a beginner; so it becomes a question of how much the code relies on clone and reference counting.

When I learned Rust, I actually never went with the "use clone or Arc to make your life easier while learning" recommendation but always used references and learned how to use lifetime declarations and program design to go as far with them as reasonable. TBF I had experience with C and C++ already. But once reasonably experienced working in Rust (after a year?), your code should be faster most of the time the way you write it on the first try without needing optimization work.

pjmlp · on April 12, 2024

With the caveat that there are various kinds of "languages with garbage collector", including some where manual resource management is an option.

secondcoming · on April 12, 2024

> In Go, a string is just a simple wrapper around []byte, and deserializing a string field can be done by simply assigning the original buffer's pointer and length to the string field. However, Rust's PROST, when deserializing String type fields, needs to copy the data from the original buffer into the String

It's interesting that the protobuf code generator for Go seems to allow this direct access. For C++, you also need a copy (and potentially a heap allocation) since `string` fields are returned via `const std::string&`. Protobuf support for `std::string_view` has been years in the making.

Maybe just switch to FlatBuffers?

v0y4g3r · on April 12, 2024

FlatBuffers does the trick as a better replacement to ProtoBuf when running in some resource-critical devices. We were working on a project that leverage Arrow IPC (internally FlatBuffers) and shared memory to collect metrics on edge devices with limit CPU and memory, hopefully we can open source it soon.

But in this case the format is defined by [Prometheus](https://github.com/prometheus/prometheus/blob/main/prompb/re...) and we just adopt that.

K0nserv · on April 12, 2024

About step 2: You still need to drop items, that's required for Vec<T> to be sound. By not calling `Vec::clear` and instead using

    pub struct RepeatedField<T> {
        vec: Vec<T>,
        len: usize,
    }

with indexing patterns like `self.vec[..self.len]` and setting `self.len = 0` to clear you avoid the cost of dropping all items at once(like Vec::clear does). However you still have to drop items, so with this solution the cost is amortised in `RepeatedField::push` and other methods that do `self.vec[i] = new_item`.

> It's designed to avoid the drop overhead

Isn't true. You don't avoid the the overhead, at most you delay/amortise it.

v0y4g3r · on April 12, 2024

WriteRequest::timeseries is a vector (https://github.com/prometheus/prometheus/blob/main/prompb/re...) and the repeated file `Timeseries::labels` and `Timeseries::samples` are reused across different timeseries. You don't have to alloc a new vector for the lables and samples for each new timeseries instance.

K0nserv · on April 12, 2024

That would be true if you used `Vec::clear` too, it doesn't allocate a new vector. My point was that you still end up running Drop implementations with RepeatedField<T>, just not all at once. See https://play.rust-lang.org/?version=stable&mode=debug&editio...

KAdot · on April 15, 2024

The benchmark is not comparing apples to apples.

prost is the most widely used Protobuf implementation in Rust, maintained by the Tokio organization. prost generates structs and serialization/deserialization code for you.

easyproto according to GitHib Search is used only by two projects. easyproto provides primitives for serializing and deserializing Protobuf, and requires hand writing code to do both.

A fair comparison would be prost vs google.golang.org/protobuf, or easyproto vs parts of quick-protobuf.

In most cases you can make Go as fast as Rust, but from my experience writing performance-sensitive code in Go requires significantly larger time investment and overall requires deeper language expertise. Pebble (RocksDB replacement in Go by CockroachDB) is a good example of this, the codebase is littered with hand-inlined[1] functions, hand-unrolled loops and it's not[2] even using Go memory management for performance critical parts, it's using the C memory allocator and manual memory management.

[prost]: https://github.com/tokio-rs/prost [easyproto]: https://github.com/VictoriaMetrics/easyproto [google.golang.org/protobuf]: https://github.com/protocolbuffers/protobuf-go [quick-protobuf]: https://github.com/tafia/quick-protobuf [1]: https://github.com/cockroachdb/pebble/blob/c34894c46703fd823... [2]: https://github.com/cockroachdb/pebble/blob/master/docs/memor...

sebstefan · on April 12, 2024

I got confused through it ; is the resulting code still technically safe?

v0y4g3r · on April 12, 2024

It's technically `safe` as long as you never access the decoded struct once the original bytes is dropped.

I believe that how `unsafe` works, the programmer, instead of compiler, ensures safety.

pflanze · on April 12, 2024

Yes, but their code appears to actually be `unsafe` (in the Rust terminology sense) without specifying that in their function declarations. They use `unsafe` inside their `slice` function, but return a value that is unsafe to use, hence `slice` should be marked `unsafe`, as should `copy_to_bytes` and then `merge_bytes`. Same for PromLabel::merge_field and PromTimeSeries::merge_field as far as I can see, and maybe higher up in their actual app. This is definitely not how Rust code is supposed to work, if a function isn't marked unsafe, it should not be allowed to introduce UB; they violate that. This approach is on par security wise with C/C++ code iff programmers are aware of the pitfalls, which normally isn't the case since Rust programmers expect non-unsafe functions to be safe (i.e. not require additional care to avoid undefined behaviour).

They either need to mark their functions `unsafe`, or use lifetimes (which may require changes in some APIs, which may be the reason they didn't).

pflanze · on April 12, 2024

I was looking at the main branch, and described the situation there. They have a different branch for the optimization work; in that branch, they do mark those functions as `unsafe` (and already did when I posted).

netprole · on April 12, 2024

In the image at the top of the article, why is the Rust crab altered to have "angry" eyes and holding a knife aimed at the Go gopher? Aside from the joke of "don't bring a knife to a protobuf fight" the inference of violence sucks and lessens the spirit of friendly competition and "all in good fun." I don't know if Rust has a code-of-conduct or rules for use of their mascot, but I bet this doesn't follow it.

anonfordays · on April 12, 2024

Congratulations! You've won the Poe's Law Post of the Day Award!

"Poe's law is an adage of Internet culture which says that, without a clear indicator of the author's intent, any parodic or sarcastic expression of extreme views can be mistaken by some readers for a sincere expression of those views."

https://en.m.wikipedia.org/wiki/Poe's_law

netprole · on April 15, 2024

But it does appear to be the author's intent, considering their Twitter account has a photo of the crab using the gohper's carcass (with dead eyes) as a carpet referencing the same article (if you translate the Chinese). Also, the author went out of their way to use a CrabLang logo (not a Rust logo) to add the knife. https://x.com/ratuthomm/status/1775183479858483439 https://imgur.com/a/txMb4Kw

anonfordays · on April 18, 2024

What intent? Have you asked the author what their intent is? The linked Twitter post does not have a knife in sight. What is wrong with using the CrabLang logo when he was using valid CrabLang?

reubenmorais · on April 12, 2024

It's the logo of CrabLang: https://crablang.org/

netprole · on April 12, 2024

Thanks for that, good to know

signa11 · on April 12, 2024

aren’t you reading a bit too much into it?

netprole · on April 12, 2024

Maybe you're not thinking about it enough? Do you know anyone who has been almost fatally stabbed (attacked by gangs) with knives? I do! It likely violates a code-of-conduct and is unprofessional.

reacharavindh · on April 12, 2024

Isn’t it tiring to think about every possible offence in the world and how someone somewhere will be offended by (knives, tires, cars, pens, saws, sharks, planes, needles, ropes, speakers, games, … you get the point) while writing an article about a programming language?

At some point one needs to exercise common sense and learn to live in a public society where people speak and not everyone is out to offend you.

netprole · on April 12, 2024

What does a weapon (a knife) have to do with an article about a programming language?

magicalhippo · on April 12, 2024

I have no association with any of these communities, but the crab holding a knife was a somewhat well-known meme[1].

I guess it can also be viewed as a play on words, given that crablang is a fork.

Given that the creators of crablang explicitly say it was a "lighthearted response"[2] to some of Rust's changes, it makes sense that they'd use a meme for a logo.

That said, seems you're not alone in wanting a different logo[3].

[1]: https://knowyourmeme.com/memes/you-mess-with-crabo-you-get-a...

[2]: https://crablang.org/

[3]: https://github.com/crablang/crab/issues/59

netprole · on April 12, 2024

Ah, I think I remember that image meme from a long time ago (I never would have connected those dots). Thanks for the context here, it actually helps take the edge off!

magicalhippo · on April 12, 2024

That's the challenge with these inside jokes. If you don't get the context, and I didn't recall it instantly either, usually the interpretation will be wildly different.

reacharavindh · on April 12, 2024

Exactly, then why does it matter that the author had anything in their post as a figure of speech or analogy?

Is it wrong to post a meme of a dog sitting near fire - https://knowyourmeme.com/memes/this-is-fine As a joke from SREs who handle firefighting calls? Does it offend dog lovers, people who are scared of fire?

netprole · on April 12, 2024

I don't think I understand the comparison. Can you think of another one that involves a weapon and directed violence?

adrian_b · on April 12, 2024

When I see a knife, it never suggests a weapon, unless it is the kind of knife that is really a weapon, like a double-edged stiletto, or it is wielded by someone who has obvious intentions to use it as a weapon.

The knife from that logo does not look like a weapon, but just like a standard utility knife. Moreover, a crab cannot move a knife in the way in which it is used as a weapon, e.g. for stabbing, but only in a way similar to a human who eats using a knife. Therefore a crab does not suggest someone who uses a knife for violent purposes.

For people like myself, who do not buy industrially-made food, there is no other tool more important than knives. Without using knives every day, I would starve to death.

So you may be offended by seeing a knife that in your mind looks like a weapon, but I am offended when someone claims that one of the most essential, if not the most essential tool of the humans suggests violence or other bad things.

Some people may use knives seldom or never, but then their lives are completely dependent of the work of other humans who use knives to produce the things that sustain the lives of those who do not use knives.

tracker1 · on April 12, 2024

Thank you. Sometimes these trains of sensitivity in thought threads can be annoying.

I definitely appreciate your contextual perspective and the ability to express it much more clearly and calmly than my own initial reaction.

I'm not sure if we submit, as a society, need to see more Tex Avery cartoons as children.

I genuinely feel that either a new civil war in the US and or global conflict is eminent and the young of today are largely ill prepared to say the least.

reacharavindh · on April 12, 2024

It’s simple. You ignore fire as a bad thing and don’t even consider that as equivalent comparison to a knife. But someone who was affected by arson probably will say “why include fire in a blog post?!” . My point is, it is not on the author to think about all these effect when they write/speak. As a mature society, we should learn to not expect every source to filter their thoughts and rather expect the consumer to filter out what they don’t want.

By not valuing fire/arson at the same level of concern as Knife(important to you), you just validated the core of the problem.

netprole · on April 15, 2024

Thanks for engaging, but I don't think I follow you. Can you think of another example that specifically involves a weapon and directed violence?

fnordpiglet · on April 12, 2024

How about the image of the dog / fire viewed by someone who was orphaned and horribly scarred for life as a child in a household fire?

Trauma exists in all forms in our world. Sometimes the extremity of trauma is used in jest simply because it’s so extreme and at odds with the situation. That’s a form of absurdist humor that absolutely runs the risk of triggering someone that the extreme situation is personal to, but was never intended to hurt anyone.

I think almost everyone considers that situation - someone triggered by personal trauma when seeing a cartoon crab attacking a cartoon gopher with a knife on a programming blog about performance between two programming languages - sad and has empathy for those suffering from a relived trauma. That must be debilitating in life and no one is insensitive to that level of embodied suffering.

However, likewise, almost no one feels sympathy for the person who pulls out a code of conduct to kill any cartoon humor not designed for the Sunday serialization of a national newspaper.

steveklabnik · on April 12, 2024

Ferrous (the crab) is deliberately not an official mascot, and is in the public domain. There is no code of conduct or rules.