Most of it is dumb automatic header generation as the article points out, but looking at the source[0], there seems to be a lot of code duplication, e.g. 5 different vcn_*.c files that seem of significant size and largely identical at a glance.
How do you figure? VCN 1.0[0] and VCN 2.0[1] differ within the first few lines and only do so further, the more you compare them. The files seem to correlate to each version of AMD's Video Core Next hardware[2]; and not anything to do with autogeneration.
Actually, there's a lot of commonality between those two files. Just look at all the "is_idle", "get_wptr", "set_wptr" type functions. They're either literally identical or the v2_0 version is a superset of functionality of the v1_0 version.
This is a case where code is identical not by random chance but due to an underlying logical pattern (in this case: ring buffers for communication between CPU and GPU seem to continue working the same way). Hiding that via code duplication is going to make the code overall less maintainable, because the code lacks implicit knowledge of these patterns.
P.S.: It's interesting how many people missed that my complaint was specifically not about the auto-generated part, but about the part that humans copy&paste.
Code duplication sometimes make sense to better decouple different parts of the code base, however that often only becomes apparent long after the "code cleanup" to remove duplicates has been done.
IDK it seems pretty common in the embedded world to add support for a new piece of hardware by copying the entire driver and then changing the parts that the new hardware supports instead of trying to come up with a driver that supports all generations of the hardware. Pretty sad if you ask me and very unmaintainable, but, at least if you ask the companies, people should just buy newer generations of their hardware if they want those bugfixes :).
Wild speculation from someone who doesn't do hardware, but I'd guess that it's not so much writing the software as testing it that motivates that. It'd be hard to keep the testing matrix of apps X hardware X driver version from exploding out of control.
Your customers will be hella pissed at you if their favorite game suddenly doesn't work on their system after a driver update. So you need to test a representative sample of previously-working software across all hardware that the updated driver applies to.
That sounds like a nightmare, and a capital-intensive one because you need to keep machines up with all of these cards, both in continuous integration and for interactive debugging.
I can understand the impulse to keep the list of hardware that new driver versions support small.
This is the real reason in my experience, combined with the bringup/support cycle that comes with hardware development. If you fork the driver for bringup, you buy yourself a lot of freedom to change things. During support phase, when the hardware is already in the hands of the customer, changing things is more risky and so you have to be conservative.
For example, maybe you had a chip bug in the previous hardware generation which caused the system to hang after several days of stress testing. You found a software workaround for the bug, but every time you touch the code you need to re-verify the workaround, which takes days.
Of course, the downsides of forking the code are also very apparent...
Another example of this in the kernel is filesystems: in theory, the ext4 driver can read/write ext2/ext3 filesystems fine. And yet the kernel for years had ext2, ext3, and ext4 implementations, each later version derived from the previous. The ext3 driver was eventually removed in 2015, but ext2 and ext4 survive.
Yeah exactly. Every new driver at AMD and NVIDIA have to fill a massive spreadsheet with most major game benchmarks from the last 5 years. Along with a collection of particularly nasty games.
The thing is, DRY has some downsides. If you've pretty much nailed down the functionality of the old thing, especially if it involves some kind of manual testing/open beta test with users, the last thing you want to do is touch the old code.
I'm not defending them, but sometimes something is good enough and you have to move on.
In the context of the Linux kernel, you've never truly nailed down the functionality of the old thing. Interfaces external to the driver evolve over time (which bleeds into changes internal to the driver), requirements change, people want to add new features that old hardware ought to support -- all this adds up and puts evolutionary pressure on the drivers. Code duplication really hurts here.
It's possible that the AMD developers are from a Windows culture where they don't have to worry about this because Microsoft gives them a stable interface. Though even there, I find it questionable whether it's a win. Surely the occasional bug will surface that affects drivers for older hardware as well?
This doesn't sound like an issue with DRY as much as an issue of monolithic design.
If driver version X worked but X+1 doesn't on your driver - don't update the driver. But when the driver is in the kernel trunk then that option isn't that simple.
> IDK it seems pretty common in the embedded world to add support for a new piece of hardware by copying the entire driver and then changing the parts that the new hardware supports instead of trying to come up with a driver that supports all generations of the hardware.
The rule of three is not exclusive to the embedded world. It´s far better to have duplicate code around then to have to deal with poor generalizations that affect multiple independent modules, which in this case mean hardware support.
To put it in perspective, how many pieces of hardware would you require to re-test and validate if you decided to refactor bits of drivers that might share a common code path?
This might not exist in C (I've never written it), but something like this seems like a fairly clean pattern you could use:
abstract class BaseDriver {
abstract commonThing()
}
abstract class BaseDriverV1 extends BaseDriver {
commonThing() {
console.log('This is the common method implementation for all V1 drivers')
}
abstract doThingOnlyV1Does()
}
class DriverV1_1 extends BaseDriverV1 {
doThingOnlyV1Does() {
console.log('This is how Driver V1.1 does the V1 thing')
}
}
This way you can use either an interface or an "abstract" definition that declares the base driver methods and the versioned driver methods, then provide version-dependent implementations where needed, or else share the common implementations by inheritance/extension.
Maybe this turns into spaghetti at scale and it actually is easier to just copy-paste all of it, who knows.
> This might not exist in C (I've never written it), but something like this seems like a fairly clean pattern you could use:
How sure are you?
I mean, if the code path of a driver is touched, that triggers all verification and validation steps on all pieces of hardware that are required to ensure that the pieces of hardware affected by it will continue to work as expected.
Furthermore, how many bugs are introduced on a daily basis because people like you and me push "fairly clean patterns" that end up triggering all sorts of unexpected consequences? I know I did, and still do, my share.
C doesn't provide any language support for this, including any notion of abstracts or classes. You can still do it manually and that's basically how drivers are implemented in Linux, but it doesn't address the combinatorial test matrix explosion.
I don't know, to me this is a bit like taking two c programs that print "hello world" and "goodbye world", looking at the assembly that is compiled then remarking "pretty sad and very unmaintainable."
Point being, maintain the code generator and not the generated code.
Right. But here the generated code is checked into the source tree and the code generator is kept proprietary, which is why it's sad and unmaintainable.
They could still separate the extraction of the data (keeping it proprietary) from the code generation. The latter could work off of the extracted data kept in a more sensible format.
That's how we do things at work. If we need to change a class function we outright copy/paste it then change what we need. This is so everything is backwards compatible. A bug in one version might be expected or documented in a previous. We can't fix the bug in an older version unless we get the OK from our clients because they might be depending/expecting it for their code to run correctly
It's a good system. You shouldn't complain. And deleting code is easy since not everything depends on everything
That part of the comment was specifically talking about the parts that are not auto-generated, but copy&pasted from one hardware generation to the next -- and then bugs need to be fixed and interface changes need to be applied in N different places forever.
Looking at things like e.g. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... , this is very obviously specific to the underlying hardware's control model in all sorts of ways. If the hardware has a gazillion control registers, then the software will have to handle a gazillion registers and will therefore need several gazillion definitions in order to do it. The hardware won't get any simpler just because it takes a long time to compile the driver.
> Compile times increase superlinearly with program size. How many cycles and developer hours are wasted if this duplication is not necessary?
Care to weight in the impact on developer hours and cycles of having to debug problems caused in how the autogenerated headers are created?
If the code is not expected to change and was already subjected to verification and validation, ensuring that it stays the same is a good way to avoid wasting time tracking bugs on code paths that assuredly were already verified.
The other time that you saw it was also probably me. It's from this talk, which is about how a large amount of generated protocol buffer code at Google led to a quadratic increase in compile times: https://youtu.be/rHIkrotSwcc?t=720.
TL;DW: The reasoning is because if you use a distributed build system, then your compile time is gated by the file with the longest compile time (tail latency). The more files you have, the greater chance that one of them takes a long time. When you generate source files, you tend to produce more files than if you didn't.
Most users don't use a distributed build system to compile the kernel, so on further thought, in that case compile times probably scale closer to linear with the number of translation units. But wasted cycles are still wasted cycles, and regardless of how exactly compile times scale, you should still consider the cost of longer compile times when you duplicate code.
Not knowing the context of the quote, I can guess a few causes:
* Compiler optimizations mostly work on a per-function basis, so that the 'n' in O(n) or O(n²) is the size of a function and not the size of the entire codebase. Not all optimization algorithms are linear, and while good superlinear optimizations should have code threshold cutoffs, in practice these may be omitted. Function sizes follow something like a power law distribution, which means larger code bases contain larger functions that are more expensive to optimize.
* For C/C++, you compile code by pasting all the header definitions into your translation unit, and feeding that to the actual lexer/parser/codegen. Larger projects are probably going to have correspondingly larger headers, and files will probably include more of them, so that the actual translation unit sizes are going to be larger even than the on-disk source files would indicate.
* This is more speculative, but larger projects are also probably likelier to blow out the various caches on your computer, so that the operations you figure are O(1) (such as hashtable lookup!) are actually more like O(lg N) in practice. And as you get larger projects that are less likely to fit any cache, the effect of the non-constant time becomes more apparent.
Yes. Despite being an avid code minimalism and refactoring advocate, I have zero problems with auto generated code size, as long as the size of "the source that generated the code" is reasonable.
Here's how I count the size of the code:
So if a codebase is 1 million lines, but 900k was autogenerated using, let's say, a 10k line 'metaprogram' (which is otherwise not included in the source), the real size of the code is 100k + 10k = 110k.
Now, with the stupid vcn_* .c duplication, that I have serious problems with. But what you could do is take a diff of the two vcn_* .c files, keep those diffs as part of the source code, plus only one version of vcn_* .c file, but now add a small script that generates the remaining vcn_* .c files during the build process by using the diffs, e.g., using the patch command. Now the size of the diffs and the size of the patch script, plus the size of one vcn_* .c files, is a better measure of the size of the code.
Of course this is a short-term workaround. There's still a need to refactor the code properly.
That can be a useful way to calculate lines of code, however it is only nominally correct.
In many cases it is more useful to calculate the lines of code according to what was checked into the build system, since those files can drift from the primary source. This is true, even if a million lines of checked-in source originated from 1k lines of meta-source.
I'm not saying a line-of-code calculating program like cloc should change its behavior.
I'm talking about the right way (IMO) of assessing the software complexity of the codebase.
And I'm also talking about the fact that it is trivial to make cloc match my definition of source code size "if" ... the developers are willing to do a small amount of work by moving code-gen into the build process and not committing pre-generated code into the source, but the metaprogram scripts instead.
> I'm talking about the right way (IMO) of assessing the software complexity of the codebase.
Yes, I get that, and that is what I meant, too. The reason is that complexity scales with the code that is checked into the build system, not just with the meta-code.
I've written a lot of code generators for various purposes, and despite the efforts to only get meta-code checked into the build system, what has happened in almost every case is the generated code has been checked into the vcs. Of course, this depends on the particular teams (or consumers) of the generator.
Eventually, the checked-in code gets changed. Maybe not right away, but it will most likely happen someday. That is why the complexity scales with the checked-in code and not with the original meta-code.
There are also other force-multipliers, so to speak. For example, if there is a vulnerability in the generated code that was checked-in, the actual attack surface of the company can be multiplied by the number of instances of generated code not by the meta code. Fixing one instance doesn't fix any other instance. Complexity and risk are inseparably entwined and should not be looked at separately.
Why would you check-in generated code and not the meta code?
> Eventually, the checked-in code gets changed.
Doesn't have to be. Especially with giant headers like in AMDGPU, it should be much easier to change the meta-code if there is a need to make modification in the generated code. Essentially you look at what needs to be changed, and work backwards to figure out what change in the meta-code will result in the same change in generated code.
I believe you might be referring to stuff like boilerplate code, the whole purpose of which is to be generated for further development. In which case I agree with you, but then boilerplate codes don't balloon the way AMDGPU header files did.
> Why would you check-in generated code and not the meta code?
I wouldn't. However, many teams do that. Perhaps they view it as an efficiency. In many cases people check-in generated code in order to perform their own risk reduction, removing a dependency and the possiblity of the generated code changing outside of their control.
> I believe you might be referring to stuff like boilerplate code, the whole purpose of which is to be generated for further development. In which case I agree with you, but then boilerplate codes don't balloon the way AMDGPU header files did.
No. I'm definitely not referring to boilerplate code.
That frankly seems like horrible software design.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...