There are multiple reasons that contributing to various projects may be difficult. But, I was replying to a specific comment about writing code in a way that is easy to understand, and the comment author's acknowledgement that this idea/practice is hard to scale to a large number of developers (presumably because everyone's skills are different and because we each have different ideas about what is "clear", etc).
So, my comment was specifically about code. Yes, developing a kernel driver requires knowledge of the hardware and its quirks. But, if we're just talking about the code, why shouldn't a competent C developer be able to read the code for an existing hardware driver and come away understanding the hardware?
And what about the parts that are NOT related to fiddly hardware? For example, look at all of the recent drama with the Linux filesystem maintainer(s) and interfacing with Rust code. Forget the actual human drama aspect, but just think about the technical code aspect: The Rust devs can't even figure out what the C code's semantics are, and the lead filesystem guy made some embarrassing outbursts saying that he wasn't going to help them by explaining what the actual interface contracts are. It's probably because he doesn't even know what his own section of the kernel does in the kind of detail that they're asking for... That last part is my own speculation, but these Rust guys are also competent at working with C code and they can't figure out what assumptions are baked into the C APIs.
Web browser code has less to do with nitty gritty hardware. Yet, even a very competent C++ dev is going to have a ton of trouble figuring out the Chromium code base. It's just too hard to keep trying to use our current tools for these giant, complex, software projects. No amount of convention or linting or writing your classes and functions to be "easy to understand" is going to really matter in the big picture. Naming variables is hard and important to do well, but at the scale of these projects, individual variable names simply don't matter. It's hard to even figure out what code is being executed in a given context/operation.
> Yet, even a very competent C++ dev is going to have a ton of trouble figuring out the Chromium code base.
I don't think this is true, or at least it wasn't circa 2018 when I was writing C++ professionally and semi-competently. I sometimes had to read, understand and change parts of the Chromium code base since I was working on a component which integrated CEF. Over time I began to think of Chromium as a good reference for how to maintain a well-organized C++ code base. It's remarkably plain and understandable, greppable even. Eventually I was able to contribute a patch or two back to CEF.
The hardest thing by far with respect to making those contributions wasn't understanding the C++, it was understanding how to work the build system for development tasks.
That's true, and fair point for the example not being the best one. It was several years ago that I was poking at the Chromium code base to investigate something. I don't honestly remember much about the code itself, but I do remember struggling with the build system like you said. And that's probably why I just remember the whole endeavor as being difficult. Though, the build system being so complicated is not totally irrelevant to my point... Understanding how to actually build and use the code has some overlap with the idea of understanding the code or project as a whole.
I guess I just don't really get your point then, it's not like the Linux Kernel or Chromium or Firefox are giant buggy messes that don't work at all. They certainly have bugs but by-and-large they work very well with minimal issues for most people. I also think their codebases are pretty approachable, IMO A competent C or C++ developer can definitely read the code from either one with a little effort - It's not the easiest thing but it's definitely not impossible, most people just don't ever try.
My point was that making meaningful contributions such a big fixes requires understanding how the code is _supposed_ to function vs. how it actually functions, that's the hard part. In the majority of cases that's simply not something the code can tell you, there's no replacement for comparing the code to a datasheet or reading the HTML spec to understand how the rendering engine is supposed to work, and those things take time to learn. For the simpler parts people do actively contribute to those without tons of previous experience (or because they already have experience with a library or etc.).
> My point was that making meaningful contributions such a big fixes requires understanding how the code is _supposed_ to function vs. how it actually functions, that's the hard part. In the majority of cases that's simply not something the code can tell you [...]
That's kind of my point, though. I'm trying to zoom out and "think outside the box" for a minute. It's hard to compose smaller pieces into larger systems if the smaller pieces have behavior that's not very well defined. And our programming languages and tools don't always make it easy for the author of a piece of code to always understand that they introduced some unintended behavior.
To your first point: I'm not shitting on Chromium or Firefox or any other software projects, but they're honestly ALL "buggy messes" in a sense. I'm a middling software dev and the software I write for my day job is definitely more buggy, overall, than these projects. So, I'm not saying that other developers are stupid (quite the opposite!). But, the fact that there are plenty of bugs at any given point in any of these projects is saying something important, IMO. If I use our current programming tools to write a Base64 encode/decode library, I can do a pretty good job and there's a good chance that it'll have zero bugs in a fairly short amount of time. But, using the same tools, there's absolutely no hope that I (we, you, whoever) could write a web browser that doesn't have any bugs. That's actually a problem! We've come to accept it because that's all we've got today, but my point is that this isn't actually an ideal place to settle.
I don't know what the answer is, but I think a lot of people don't even seem to realize there's a problem. My claim is that there is a problem and that our current paradigms and tools simply don't scale well. I'm not creative enough to be the one who has the eureka moment that will bring us to the next stage of our evolution, but I suspect that it's what we'll need to actually be able to achieve complex software that actually works as we intend it to.
So, my comment was specifically about code. Yes, developing a kernel driver requires knowledge of the hardware and its quirks. But, if we're just talking about the code, why shouldn't a competent C developer be able to read the code for an existing hardware driver and come away understanding the hardware?
And what about the parts that are NOT related to fiddly hardware? For example, look at all of the recent drama with the Linux filesystem maintainer(s) and interfacing with Rust code. Forget the actual human drama aspect, but just think about the technical code aspect: The Rust devs can't even figure out what the C code's semantics are, and the lead filesystem guy made some embarrassing outbursts saying that he wasn't going to help them by explaining what the actual interface contracts are. It's probably because he doesn't even know what his own section of the kernel does in the kind of detail that they're asking for... That last part is my own speculation, but these Rust guys are also competent at working with C code and they can't figure out what assumptions are baked into the C APIs.
Web browser code has less to do with nitty gritty hardware. Yet, even a very competent C++ dev is going to have a ton of trouble figuring out the Chromium code base. It's just too hard to keep trying to use our current tools for these giant, complex, software projects. No amount of convention or linting or writing your classes and functions to be "easy to understand" is going to really matter in the big picture. Naming variables is hard and important to do well, but at the scale of these projects, individual variable names simply don't matter. It's hard to even figure out what code is being executed in a given context/operation.