If using indices is going to be your answer, then it seems to me you should at least contend with the OP's argument that this approach violates the very reason the borrowchecker was introduced in the first place.
From the post:
"The Rust community's whole thing is commitment to compiler-enforced correctness, and they built the borrowchecker on the premise that humans can't be trusted to handle references manually. When the same borrowchecker makes references unworkable, their solution is to... recommend that I manually manage them, with zero safety and zero language support?!? The irony is unreal."
>OP's argument that this approach violates the very reason the borrowchecker was introduced in the first place.
No it doesn't. I just don't think author understands the pitfalls of implementing something like a graph structure in a memory unsafe language. The author doesn't write C so I don't believe he has struggled with the pain of chasing a dangling pointer with valgrind.
There are plenty of libraries in C that eventually decided to use indexes instead of juggling pointers around because it's much harder to eventually introduce a use-after-free when dereferencing nodes this way.
Entity component systems were invented in 1998 which essentially implement this pattern. I don't find it ironic that the Rust compiler herds people towards a safe design that has been rediscovered again and again.
The borrow checker was introduced to statically verify memory safety. Using indices into graphs has been a memory safe option in languages like C for decades. I find his argument as valid as if someone said "I can't use goto? you expect me to manually run my cleanup code before I return?" Just because I took away your goto to make control flow easier it doesn't make it "ironic" if certain legitimate uses of goto are harder. Surely you wouldn't accept his argument for someone arguing for the return of goto in mainstream languages?
Whoah, hold on, the author isn't comparing writing graph structures in Rust to writing it in memory-unsafe languages --- they're comparing it to writing it in other memory-safe languages. You can't force a false dichotomy between Rust and C to rebut them.
The comparison is contrived precisely because he's comparing it to other memory-safe languages. The borrow checker was introduced because the goal of the language was a memory safe language without a runtime. Falling back to indices isn't "ironic", its exactly how you would solve the problem in C/C++.
If your argument is "well Rust should be like Julia and have a GC", well thats not Rust. That language also exists, its possibly called OCaml, but its not Rust.
The author is also suggesting that a data structure consisting of a sea of objects referencing each other works perfectly in Python. It doesn’t — Python’s GC can only collect it as a cycle, and this path is not immediate the way the refcount is. And you’re paying for refcounts. Even in a language with full tracing GC (Java, etc), you’re not winning any performance points by making a ton of objects all referencing each other.
Indices or arenas or such will result in better code in all these languages.
I think it's reasonable enough. The author already argued that there are reasons for non GC languages to exist, even if the performance doesn't matter to them.
One interpretation of the article is just the author doesn't personally like the borrow checker, but another interpretation is the author saying the borrow checker is just a bad abstraction.
So under the assumption that we don't have a GC available, what else can we compare the borrow checker against?
Author here. Yeah, I don't like the borrowchecker. But the motivation for me writing the article is the almost religious zeal with which many Rustaceans refuse to even acknowledge that the borrowchecker has a cost in terms of ergonomics, and that this translates to e.g. iteration speed.
You encounter borrowchecker issues? Well, you're just a beginner or not skilled enough. Rust makes you jump through hoops? No it doesn't, it just makes you pay upfront what you otherwise would have. It slows development? No, studies show it doesn't, you must be imagining it.
This is extremely annoying to be on the receiving end of. Even though it comes from a good place (mostly just excitement), it can feel like gaslighting.
You should look up the term "zero cost abstractions".
It's the organizing principle of the second generation of Rust's leadership[1]. Formally, it means "zero runtime cost"[2], but the now-former maintainers operated as though it meant Rust could get rid of all cost. The belief was that they can have a language that's faster than C, safer than Ada, more ergonomic than Java, more memory safe than Go, by either making the compiler do more work, or working more on the compiler. In practice, I think this belief caused massive complexity in the compiler, trade-off dishonesty in the community, and bad evangelism in domains unsuited for memory safety (e.g. games programming)
[1] Graydon, the original author of Rust, was against this idea.
[2] The term originates from C++ as "zero overhead" which was smaller in scope, and not a governing principle of the C++ language.
Indices can be dangling in almost exactly the same way as pointers. Worse, it's easier to accidentally use-after-free clobber some other item in the same structure, because allocations are "dense." (Pointer designs on systems where malloc/free is ~LIFO experience similar problems.)
This is well known to be solved by using generational indexes. Its not a big deal. The author's entire post is an overreaction/rage-bait.
Basically any data structure like this where you want it relocatable in memory is going to use indirection like indexes or something instead of pointers. Its a very common use case outside of rust.
not sure I agree. UB has the benefit that there are a lot of tools like valgrind, memcheck, and various compiler features to detect it both statically and at runtime. And when you hit UB it is certainly an error.
Non-UB errors have no such powerful tools, only what application specific tests or assertions you create.
I've seen plenty of cases where people went and turned signed types into unsigned because signed overflow is undefined, but it had the real effect of turning easily detected and diagnosed issues into much harder to find bugs.
Also like, it's not universally true that UB bugs (broad class of problems) are all harder to debug than non-UB bugs (another broad class of problems). Null pointer dereferences are UB, and among the easiest class of problem to debug.
It’s pretty common to implement graphs in terms of arrays, not because of indices but because of cache locality.
So your “UB” and “non-UB” code would look effectively identical to the CPU and would take the same amount of debugging.
The reality is whether an index was tombstones and referenced or “deallocated” and referenced it is still a programmer fault that is a bug that the compiler could not catch
Ummm. Yeah it is. I'm sure you can come up with some cases where the impact of the bugs is roughly equivalent, but generally speaking, UB is a terrible class of bug.
For example, if the aho-corasick crate accidentally tries to look up a dangling state node, you'll get a panic. But if this were UB instead, that could easily lead to a security problem or just pretty much anything... because behavior is undefined.
So generally speaking, yes, this is absolutely a major difference and it matters in practice. You even have other people in this thread saying this technique is used in C for this exact reason. Leaving out this very obvious difference and pretending like these are the same thing is extremely misleading.
Author here. In scientific computing, there isn't much of a security implication of UB, and the most dangerous kind of error are when your program silently computes the wrong thing. And that is especially likely when you use indices manually.
You could say that UB enables all behaviour, including silently wrong answers, and you'd be right. But it's more likely to crash your program and therefore be caught.
Most importantly, the comparison to a raw pointer is not relevant. My blog post states that integers come with zero safety (as in: preventing bugs, not risk of UB) and zero language support and that is true. My blog post compares Rust with GC languages, not with raw pointer arithmetic. And it's clear that, when you compare to using GC references, manual indices are horribly unsafe.
You're engaging in a motte-and-bailey fallacy. Your motte is your much more narrow claim about Rust's advantages in a context where you claim not to care about undefined behavior as a class of bugs more severe than logic errors. And you specifically make this claim in an additional context where a GC language is appropriate. That's an overall pretty easy position to defend, and if that were clearly the extent of it, I probably wouldn't have responded at all.
But, that context has been dropped in this thread, and instead folks are making very general claims outside of that very restricted context. Moreover, your bailey is that you don't do a good job outlining that context either. Your blog's opening paragraphs mention nothing about that more constrained context and instead seem to imply a very general context. This is much harder to defend. You go on to mention scientific computing, but instead of it being a centerpiece of the context of your claim, it's just mentioned as aside. Instead, your blog appears to be making very broad claims. But you've jumped in here to narrow them significantly, to the point that it materially changes your point IMO. Let's just look at what you said here:
> The first time someone gave be this advice, I had to do a double take. The Rust community's whole thing is commitment to compiler-enforced correctness, and they built the borrowchecker on the premise that humans can't be trusted to handle references manually. When the same borrowchecker makes references unworkable, their solution is to... recommend that I manually manage them, with zero safety and zero language support?!? The irony is unreal. Asking people to manually manage references is so hilariously unsafe and unergonomic, the suggestion would be funny if it wasn't mostly sad.
There's no circumspection about the context. You're just generally and broadly dismissing this entirely as if it weren't a valid thing ever. But it absolutely is a valid technique and it has real practical differences with an approach that uses raw pointers. If the comparison is with a GC and that context is made clear, then yes, absolutely, the comparison point changes entirely! If you can abide a GC, then a whole bunch of things get easier... at some cost. For example, I don't think it's possible to write a tool like ripgrep with its performance profile in a GC language. At least, I've never seen it done.
> I don't think it's possible to write a tool like ripgrep with its performance profile in a GC language.
I think it is possible to make a language that has both a GC and a borrow checker, treating the GC types as a third level next to the stack and the heap, where complex referencial cycles can bé promoted to the GC, but the defaults push you towards fast execution patterns. Don't know if such a language would be successful in finding its niche. The only way I could see that, is of a non-gc mode could be enforced so that libraries can be written in the more restrictive, faster by default mode, while being consumed by application developers that have less stringent restrictions. This is no different in concept than Python libraries implemented in native languages. Making it the mode be part of the same language could help with prototyping pains: write with the GC and then refactor once at the end after the general design is mostly found.
My standard of evidence here is existence. There are many greps written in a "GC language," and none of them have a broad performance profile in the same category as ripgrep (or other greps written in C and C++).
> This is no different in concept than Python libraries implemented in native languages.
If that's true, then why isn't there a fast grep written in Python? IMO, it's for the same reason that tools like csvkit are slow. Once you get to the boundary between GC and non-GC, some kind of cost gets paid. Now maybe Python is paying higher costs here than your hypothetical language, but who knows. Without actual existence, I find this sort of hypothetical to be very uncompelling. It's too broad of a stroke.
> I think it is possible to make a language that has both a GC and a borrow checker, treating the GC types as a third level next to the stack and the heap, where complex referencial cycles can be 'promoted to the GC'
It's doable but there are a few issues with that whole idea. You need the ability to safely promote objects to GC-roots whenever they're being referenced by GC-unaware code, and demote them again afterwards. And if any GC object happens to control the lifecycle of any heap-allocated objects, it must have a finalizer that cleans them up RAII style to avoid resource leaks.
> And that is especially likely when you use indices manually.
But that is never the recommendation Rust practitioners would give for graph data structures. One would instead recommend using a generational arena, where each index holds their "generation". The arena can be growable or not. When an element is removed from the arena it gets tombstoned, marked as no longer valid. If a new value reuses a tombstoned position, its generation changes. This shifts the cost of verifying the handle is correct at the read point: if the handle corresponds to an index with a tombstone sentinel or has a different generation, the result of the read operation is None, meaning the handle is no longer valid. This is much better than the behavior of pointers.
For the record, I completely agree that using GC is appropriate for problems that deal with possibly-cyclic general graphs, and maybe even wrt. DAG structures whenever one wishes to maximize throughput and the use of simpler refcounting would involve heavy overhead. (This is a well-known pitfall wrt. languages like Swift, that enforce pervasive use of atomic reference counting.)
Since the use of "pluggable" GC in C-like or Rust-like languages is still uncommon, this generally means resorting to a language which happens to inherently rely on GC, such as Golang.
Might I suggest that the scpptool-enforced safe subset of C++ has a better solution for such data structures with cyclic or complex reference graphs, which is run-time checked non-owning pointers [1] that impose no restrictions on how or where the target objects are allocated. Unlike indices, they are safe against use-after-destruction, and they don't require the additional level of indirection either.
C doesn’t have objects, which are great for encapsulating data structures, but in C++ it’s not at all hard to write a graph data structure.
One wraps it behind a reliable interface, writes automated test and runs them under valgrind/sanitizers and that’s pretty much it.
It’s normal for life and software development to have a certain degree of risk and to spend some effort solving problems.
Too many HN comments make it sound like it’s a jungle out there and a use-after-free will chop your head clean off.
I don't mean to imply that writing a graph in C++ is impossible. It's clearly possible in modern C++.
My contention is treating indicies-based management as some tedious, manual workaround. Unity's ECS is written in C#, which has both a GC and the language expressiveness for an actual graph objects, but has adopted an indicies based system. It works, and is performant, so it isn't a mistake for the compiler to herd users down that path.
In a similar vein, the Linux codebase is full of completely legitimate and readable uses of goto, but we are perfectly happy with languages that force us to use structured control flow.
The OP's argument is bunk. It's been said many times too over the years. The fact is that the index approach does not give up everything. The obvious thing it doesn't give up is safety. It's true you can still get bugs via out of bounds accesses, but it won't result in undefined behavior. You get a panic instead.
This is how the regex crate works internally and uses almost no `unsafe`.
That seems rather easy? There's still plenty of safety. It's not an `unsafe` trick or worse, you still have bounds checking and safe concurrency and well-defined behavior, and you can trivially guarantee that (while mutating the vec) it cannot be changed unexpectedly while you hold ownership of the vec. The extreme ease of making that guarantee is kinda the core of their complaint.
The dangling-pointer equivalent is an issue, but it's still safe (unlike in C-like langs) and there are ways to mitigate the risk of accidental misbehavior (e.g. generational pointers, or simply an ID check if you have a convenient ID).
That's quite a bit different than what you get in almost any other widely-used language - e.g. some will at best be able to claim "concurrent access won't lead to undefined behavior" via e.g. the GIL, but not prevent unexpected modification (e.g. "freeze" doesn't deep-freeze in the vast majority of languages).
An important saving grace that `unsafe` has is that it's local and clearly demarcated. If a core data structure of your program can be compared to `unsafe` and has to be manually managed for correctness, it's very valid to ask whether the hoops Rust makes you jump through are actually gaining you anything.
From the post:
"The Rust community's whole thing is commitment to compiler-enforced correctness, and they built the borrowchecker on the premise that humans can't be trusted to handle references manually. When the same borrowchecker makes references unworkable, their solution is to... recommend that I manually manage them, with zero safety and zero language support?!? The irony is unreal."