Most of it is dumb automatic header generation as the article points out, but lo...

deaddodo · on Oct 11, 2020

How do you figure? VCN 1.0[0] and VCN 2.0[1] differ within the first few lines and only do so further, the more you compare them. The files seem to correlate to each version of AMD's Video Core Next hardware[2]; and not anything to do with autogeneration.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

[2] https://en.wikipedia.org/wiki/Video_Core_Next

atq2119 · on Oct 11, 2020

Actually, there's a lot of commonality between those two files. Just look at all the "is_idle", "get_wptr", "set_wptr" type functions. They're either literally identical or the v2_0 version is a superset of functionality of the v1_0 version.

This is a case where code is identical not by random chance but due to an underlying logical pattern (in this case: ring buffers for communication between CPU and GPU seem to continue working the same way). Hiding that via code duplication is going to make the code overall less maintainable, because the code lacks implicit knowledge of these patterns.

P.S.: It's interesting how many people missed that my complaint was specifically not about the auto-generated part, but about the part that humans copy&paste.

flohofwoe · on Oct 12, 2020

Code duplication sometimes make sense to better decouple different parts of the code base, however that often only becomes apparent long after the "code cleanup" to remove duplicates has been done.

fulafel · on Oct 12, 2020

Code duplication this way (versions of code for different hw) is common in Linux though and makes a certain kind of sense.

est31 · on Oct 11, 2020

IDK it seems pretty common in the embedded world to add support for a new piece of hardware by copying the entire driver and then changing the parts that the new hardware supports instead of trying to come up with a driver that supports all generations of the hardware. Pretty sad if you ask me and very unmaintainable, but, at least if you ask the companies, people should just buy newer generations of their hardware if they want those bugfixes :).

rictic · on Oct 11, 2020

Wild speculation from someone who doesn't do hardware, but I'd guess that it's not so much writing the software as testing it that motivates that. It'd be hard to keep the testing matrix of apps X hardware X driver version from exploding out of control.

Your customers will be hella pissed at you if their favorite game suddenly doesn't work on their system after a driver update. So you need to test a representative sample of previously-working software across all hardware that the updated driver applies to.

That sounds like a nightmare, and a capital-intensive one because you need to keep machines up with all of these cards, both in continuous integration and for interactive debugging.

I can understand the impulse to keep the list of hardware that new driver versions support small.

colonwqbang · on Oct 11, 2020

This is the real reason in my experience, combined with the bringup/support cycle that comes with hardware development. If you fork the driver for bringup, you buy yourself a lot of freedom to change things. During support phase, when the hardware is already in the hands of the customer, changing things is more risky and so you have to be conservative.

For example, maybe you had a chip bug in the previous hardware generation which caused the system to hang after several days of stress testing. You found a software workaround for the bug, but every time you touch the code you need to re-verify the workaround, which takes days.

Of course, the downsides of forking the code are also very apparent...

gsnedders · on Oct 11, 2020

Another example of this in the kernel is filesystems: in theory, the ext4 driver can read/write ext2/ext3 filesystems fine. And yet the kernel for years had ext2, ext3, and ext4 implementations, each later version derived from the previous. The ext3 driver was eventually removed in 2015, but ext2 and ext4 survive.

aspaceman · on Oct 12, 2020

Yeah exactly. Every new driver at AMD and NVIDIA have to fill a massive spreadsheet with most major game benchmarks from the last 5 years. Along with a collection of particularly nasty games.

oblio · on Oct 11, 2020

The thing is, DRY has some downsides. If you've pretty much nailed down the functionality of the old thing, especially if it involves some kind of manual testing/open beta test with users, the last thing you want to do is touch the old code.

I'm not defending them, but sometimes something is good enough and you have to move on.

atq2119 · on Oct 11, 2020

In the context of the Linux kernel, you've never truly nailed down the functionality of the old thing. Interfaces external to the driver evolve over time (which bleeds into changes internal to the driver), requirements change, people want to add new features that old hardware ought to support -- all this adds up and puts evolutionary pressure on the drivers. Code duplication really hurts here.

It's possible that the AMD developers are from a Windows culture where they don't have to worry about this because Microsoft gives them a stable interface. Though even there, I find it questionable whether it's a win. Surely the occasional bug will surface that affects drivers for older hardware as well?

reader_mode · on Oct 11, 2020

This doesn't sound like an issue with DRY as much as an issue of monolithic design.

If driver version X worked but X+1 doesn't on your driver - don't update the driver. But when the driver is in the kernel trunk then that option isn't that simple.

barumi · on Oct 11, 2020

> IDK it seems pretty common in the embedded world to add support for a new piece of hardware by copying the entire driver and then changing the parts that the new hardware supports instead of trying to come up with a driver that supports all generations of the hardware.

The rule of three is not exclusive to the embedded world. It´s far better to have duplicate code around then to have to deal with poor generalizations that affect multiple independent modules, which in this case mean hardware support.

To put it in perspective, how many pieces of hardware would you require to re-test and validate if you decided to refactor bits of drivers that might share a common code path?

gavinray · on Oct 11, 2020

This might not exist in C (I've never written it), but something like this seems like a fairly clean pattern you could use:

  abstract class BaseDriver {
    abstract commonThing()
  }

  abstract class BaseDriverV1 extends BaseDriver {
    commonThing() {
      console.log('This is the common method implementation for all V1 drivers')
    }

    abstract doThingOnlyV1Does() 
  }

  class DriverV1_1 extends BaseDriverV1 {
    doThingOnlyV1Does() {
      console.log('This is how Driver V1.1 does the V1 thing')
    }
  }

This way you can use either an interface or an "abstract" definition that declares the base driver methods and the versioned driver methods, then provide version-dependent implementations where needed, or else share the common implementations by inheritance/extension.

Maybe this turns into spaghetti at scale and it actually is easier to just copy-paste all of it, who knows.

barumi · on Oct 11, 2020

> This might not exist in C (I've never written it), but something like this seems like a fairly clean pattern you could use:

How sure are you?

I mean, if the code path of a driver is touched, that triggers all verification and validation steps on all pieces of hardware that are required to ensure that the pieces of hardware affected by it will continue to work as expected.

Furthermore, how many bugs are introduced on a daily basis because people like you and me push "fairly clean patterns" that end up triggering all sorts of unexpected consequences? I know I did, and still do, my share.

AlotOfReading · on Oct 11, 2020

C doesn't provide any language support for this, including any notion of abstracts or classes. You can still do it manually and that's basically how drivers are implemented in Linux, but it doesn't address the combinatorial test matrix explosion.

qppo · on Oct 11, 2020

I don't know, to me this is a bit like taking two c programs that print "hello world" and "goodbye world", looking at the assembly that is compiled then remarking "pretty sad and very unmaintainable."

Point being, maintain the code generator and not the generated code.

ptx · on Oct 11, 2020

Right. But here the generated code is checked into the source tree and the code generator is kept proprietary, which is why it's sad and unmaintainable.

unionpivo · on Oct 12, 2020

Code generator probably runs of their verilog/hdl source of the chips. It probably relies (or is build on top of) on proprietary EDA tools.

Without open sourcing their chips, it would probably be useless.

Even if you work for chip maker, that is what you usually get from hardware guys, so we are probably getting the same as AMD own driver programmers.

\s Welcome to wonderful world of driver development at majority of hardware houses \s

ptx · on Oct 12, 2020

They could still separate the extraction of the data (keeping it proprietary) from the code generation. The latter could work off of the extracted data kept in a more sensible format.

notanotherycomb · on Oct 12, 2020

That's how we do things at work. If we need to change a class function we outright copy/paste it then change what we need. This is so everything is backwards compatible. A bug in one version might be expected or documented in a previous. We can't fix the bug in an older version unless we get the OK from our clients because they might be depending/expecting it for their code to run correctly

It's a good system. You shouldn't complain. And deleting code is easy since not everything depends on everything

pantalaimon · on Oct 11, 2020

> That frankly seems like horrible software design.

Why? If it's auto-generated and the generated code is never touched by humans, that's not too bad.

This information is sourced by some CAD/synthesis tool and just describes the hardware register layout.

Manually maintaining that would be much worse and error prone.

atq2119 · on Oct 11, 2020

That part of the comment was specifically talking about the parts that are not auto-generated, but copy&pasted from one hardware generation to the next -- and then bugs need to be fixed and interface changes need to be applied in N different places forever.

tylerhou · on Oct 11, 2020

Compile times increase superlinearly with program size. How many cycles and developer hours are wasted if this duplication is not necessary?

alxlaz · on Oct 11, 2020

Looking at things like e.g. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... , this is very obviously specific to the underlying hardware's control model in all sorts of ways. If the hardware has a gazillion control registers, then the software will have to handle a gazillion registers and will therefore need several gazillion definitions in order to do it. The hardware won't get any simpler just because it takes a long time to compile the driver.

barumi · on Oct 11, 2020

> Compile times increase superlinearly with program size. How many cycles and developer hours are wasted if this duplication is not necessary?

Care to weight in the impact on developer hours and cycles of having to debug problems caused in how the autogenerated headers are created?

If the code is not expected to change and was already subjected to verification and validation, ensuring that it stays the same is a good way to avoid wasting time tracking bugs on code paths that assuredly were already verified.

mcguire · on Oct 11, 2020

"Compile times increase superlinearly with program size."

I've seen this said a couple of times lately. What is the source?

tylerhou · on Oct 11, 2020

The other time that you saw it was also probably me. It's from this talk, which is about how a large amount of generated protocol buffer code at Google led to a quadratic increase in compile times: https://youtu.be/rHIkrotSwcc?t=720.

TL;DW: The reasoning is because if you use a distributed build system, then your compile time is gated by the file with the longest compile time (tail latency). The more files you have, the greater chance that one of them takes a long time. When you generate source files, you tend to produce more files than if you didn't.

Most users don't use a distributed build system to compile the kernel, so on further thought, in that case compile times probably scale closer to linear with the number of translation units. But wasted cycles are still wasted cycles, and regardless of how exactly compile times scale, you should still consider the cost of longer compile times when you duplicate code.

With regards to link time optimization: sophisticated analyses take superlinear complexity: https://cs.stackexchange.com/questions/22435/time-complexity....

Disclaimer: I work at Google.

jcranmer · on Oct 11, 2020

Not knowing the context of the quote, I can guess a few causes:

* Compiler optimizations mostly work on a per-function basis, so that the 'n' in O(n) or O(n²) is the size of a function and not the size of the entire codebase. Not all optimization algorithms are linear, and while good superlinear optimizations should have code threshold cutoffs, in practice these may be omitted. Function sizes follow something like a power law distribution, which means larger code bases contain larger functions that are more expensive to optimize.

* For C/C++, you compile code by pasting all the header definitions into your translation unit, and feeding that to the actual lexer/parser/codegen. Larger projects are probably going to have correspondingly larger headers, and files will probably include more of them, so that the actual translation unit sizes are going to be larger even than the on-disk source files would indicate.

* This is more speculative, but larger projects are also probably likelier to blow out the various caches on your computer, so that the operations you figure are O(1) (such as hashtable lookup!) are actually more like O(lg N) in practice. And as you get larger projects that are less likely to fit any cache, the effect of the non-constant time becomes more apparent.

saagarjha · on Oct 11, 2020

Internal compiler datastructures and the need to keep more things in memory?

jchw · on Oct 11, 2020

This particular thought thread is about more translation units, not larger translation units. I can't see why those would not mostly scale linearly.

tylerhou · on Oct 11, 2020

Tail latency. See my other comment (https://news.ycombinator.com/item?id=24749000).

rational_indian · on Oct 11, 2020

Are you saying C compilers are O(n)?

barumi · on Oct 11, 2020

> Are you saying C compilers are O(n)?

OP asked for sources that substantiated a technical claim. No assertion was given.

Unless you have a source that either supports or refutes the claim, trying to deflect the question does nothing to address the problem.

formerly_proven · on Oct 11, 2020

I very much expect O(n) compilation of stuff like array declarations, enums, CPP macro definitions and so forth. What else would it be?

FartyMcFarter · on Oct 11, 2020

They may well be, provided one stays away from bad patterns such as humongous complex functions?

fizixer · on Oct 11, 2020

Yes. Despite being an avid code minimalism and refactoring advocate, I have zero problems with auto generated code size, as long as the size of "the source that generated the code" is reasonable.

Here's how I count the size of the code:

So if a codebase is 1 million lines, but 900k was autogenerated using, let's say, a 10k line 'metaprogram' (which is otherwise not included in the source), the real size of the code is 100k + 10k = 110k.

Now, with the stupid vcn_* .c duplication, that I have serious problems with. But what you could do is take a diff of the two vcn_* .c files, keep those diffs as part of the source code, plus only one version of vcn_* .c file, but now add a small script that generates the remaining vcn_* .c files during the build process by using the diffs, e.g., using the patch command. Now the size of the diffs and the size of the patch script, plus the size of one vcn_* .c files, is a better measure of the size of the code.

Of course this is a short-term workaround. There's still a need to refactor the code properly.

IncRnd · on Oct 11, 2020

That can be a useful way to calculate lines of code, however it is only nominally correct.

In many cases it is more useful to calculate the lines of code according to what was checked into the build system, since those files can drift from the primary source. This is true, even if a million lines of checked-in source originated from 1k lines of meta-source.

fizixer · on Oct 11, 2020

I'm not saying a line-of-code calculating program like cloc should change its behavior.

I'm talking about the right way (IMO) of assessing the software complexity of the codebase.

And I'm also talking about the fact that it is trivial to make cloc match my definition of source code size "if" ... the developers are willing to do a small amount of work by moving code-gen into the build process and not committing pre-generated code into the source, but the metaprogram scripts instead.

IncRnd · on Oct 11, 2020

> I'm talking about the right way (IMO) of assessing the software complexity of the codebase.

Yes, I get that, and that is what I meant, too. The reason is that complexity scales with the code that is checked into the build system, not just with the meta-code.

I've written a lot of code generators for various purposes, and despite the efforts to only get meta-code checked into the build system, what has happened in almost every case is the generated code has been checked into the vcs. Of course, this depends on the particular teams (or consumers) of the generator.

Eventually, the checked-in code gets changed. Maybe not right away, but it will most likely happen someday. That is why the complexity scales with the checked-in code and not with the original meta-code.

There are also other force-multipliers, so to speak. For example, if there is a vulnerability in the generated code that was checked-in, the actual attack surface of the company can be multiplied by the number of instances of generated code not by the meta code. Fixing one instance doesn't fix any other instance. Complexity and risk are inseparably entwined and should not be looked at separately.

fizixer · on Oct 11, 2020

Why would you check-in generated code and not the meta code?

> Eventually, the checked-in code gets changed.

Doesn't have to be. Especially with giant headers like in AMDGPU, it should be much easier to change the meta-code if there is a need to make modification in the generated code. Essentially you look at what needs to be changed, and work backwards to figure out what change in the meta-code will result in the same change in generated code.

I believe you might be referring to stuff like boilerplate code, the whole purpose of which is to be generated for further development. In which case I agree with you, but then boilerplate codes don't balloon the way AMDGPU header files did.

IncRnd · on Oct 11, 2020

> Why would you check-in generated code and not the meta code?

I wouldn't. However, many teams do that. Perhaps they view it as an efficiency. In many cases people check-in generated code in order to perform their own risk reduction, removing a dependency and the possiblity of the generated code changing outside of their control.

> I believe you might be referring to stuff like boilerplate code, the whole purpose of which is to be generated for further development. In which case I agree with you, but then boilerplate codes don't balloon the way AMDGPU header files did.

No. I'm definitely not referring to boilerplate code.