That's funny, I usually find a common source of unnecessary complexity is people trying way too hard to de-duplicate everything. Which has a tendency to causes abstractions that are more cognitively and technically expensive than the code they replaced... and once enough layers accreted, I'm pretty sure they take up more space.
Like with many things it's a balance. If you have to spend too much time thinking about how to create a more reusable or deduplicated version of something, it's probably not ready to be deduplicated. On the other hand, I've watched several projects flounder because of useless duplication:
The duplication means that issues are scattered throughout every copy if they're in the original, but hard to find after the modifications. Doing the exercise to find all the duplicated issues is frustrating and people eventually stop doing it. The code gets worse and worse until a refactor or rewrite is forced on them, and then the customer is unhappy with the delay and it's either hastily done (often worse than doing nothing) or abandoned.
I’ve heard this argument against deduplication many times before. Wrong abstraction is not a solution to duplicate code. It’s a different problem. If developers are not able to perform the simplest of abstractions, the problem is more serious.
Totally agree the wrong abstraction is the problem, but I think (having watched this happen many times in practice) this is way too dismissive of how easily this can happen to reasonable people just trying to write DRY code. If it seems like this happening is ridiculous and/or that people are idiots, then it’s possible you’re prone to it and won’t be able to recognize it.
Far more often than writing an obvious and easy to see wrong abstraction on the spot, the usual case in reality is that someone reasonably factors multiple parts of the code to depend on a single piece of reused code, and the multiple parts have very subtle and hard to recognize difference in their goals. Often these differences don’t manifest until more code accumulates and the goals begin to drift further apart. By the time it becomes clear, the dependency is harder to undo.
The problem is more serious precisely because developers generally are able to perform the simplest of abstractions and more.
Making good abstractions, and not making bad abstractions, is the craft of programming. Of course it’s hard and people will get it wrong a lot of the time. The solution to that is to get better, not give up.
FWIW, nobody is suggesting giving up. Making abstractions while coding is indeed incredibly common and part of the job, but choosing to avoid abstraction is sometimes called for. It’s very important to recognize that abstractions come with an engineering tradeoff and have negative impacts on clarity, simplicity, specificity, ease of reading and understanding, etc. It’s good if the abstraction’s benefits overcomes these downsides over time, and it’s bad if the abstraction adds more weight than it ever lifts.
I somewhat agree, but not entirely, that the craft of programming is to make abstractions. There is some truth to that, but your statement implies that coding is only about making abstractions and nothing else. I also see programming as a means to an end, and I’m suggesting the end is usually more important than the means. It is possible to walk into existing code and reduce the amount of abstraction, and it is possible to make new features in existing software without adding any abstractions.
I’ve been writing software for a long time and watched quite a few people over-engineer their abstractions and cause real problems. I think there’s a human tendency to try to meet larger needs than we have at the moment, after all software is about automating and scaling. There is a reason why most sizeable software projects globally have been late and/or over budget. We are taught in CS school how to make abstractions, and we are not ever taught how and when to avoid them. Learning the art and craft of software includes how to see when abstracting something will leave things worse than you found them.
My tendency to watch out for over-abstraction has led me to fail to abstract things when I should. I have definitely made the mistake of staying too specific and allowing duplication to linger longer than it should. In a couple of cases it’s cost me weeks of time to fix. But, despite that, I still think erring away from abstraction, when it’s a real choice, is often the right call. On the flip side, I’ve witnessed over-engineering cost whole teams more than one or two years, and millions of dollars.
> Often these differences don’t manifest until more code accumulates and the goals begin to drift further apart.
If an abstraction simplifies the code when it is introduced, but turns out to be wrong in hindsight, then this is a problem of software evolution rather than wrong abstraction. In such a case, I would argue that abstracted code is better than duplicate code, because detecting that the duplicate code instances are the same is more difficult than finding all instances of the abstraction and correcting them.
All of the notes of software waste in the article are about what happens with software evolution over time. It’s a mistake to evaluate any code as static. The idea of a “wrong abstraction” was never about whether the code works or doesn’t; the point of the abstraction being wrong is that it complicates evolution. Any abstraction can be fine at a single point in time if all tests pass and the software functions without bugs. So, I don’t see a distinction between evolution and the wrong abstraction. If you’re never going to change it again, there’s no such thing as the wrong abstraction.
We can’t debate or say anything useful about whether an abstraction is better than duplication without looking at specific cases; I would not presume to claim abstraction is better than duplication under any generic rule whatsoever. It simply depends on the code in question. That said, I don’t really understand what you mean about detecting duplicate instances being harder, because the premise of the abstraction being wrong is that you don’t want to find duplicate instances. If the abstraction becomes more wrong over time, then it doesn’t matter which task is harder, one of the two you mentioned would be going the wrong way. If the abstraction is wrong, then there’s by definition only one of them, no?
Definitely agreed. By software evolution, I was referring to your mentioning “goals drifting apart”. Code is always dynamic, constantly maintained, but goals don't always drift apart. When they do, it's because of unexpected circumstances. Doing a right abstraction for unexpected cases is almost not possible by definition. (It is, but in that case one falls into the more dangerous trap of premature abstraction.)
> We can’t debate or say anything useful about whether an abstraction is better than duplication without looking at specific cases
Agreed again. I'm aware that I'm overgeneralizing at the cost of neglecting any nuance. Once, I encountered nested loops iterating over a data structure and performing simple operations. I was abstracting these loops into a static method. Unfortunately, not all loops were exact copies. Some loops defined new variables while some reused the existing ones from surrounding code, some had different loop bounds while some performed the same functionality by breaking the loop inside an if. Some loops did a different job, but looked the same.
I had to be sure that each code piece was doing the same thing by manually inspecting and testing it (there were no automatic tests and code wasn't structured well for testing purposes). If these nested loops were abstracted in the first place, there wouldn't be any doubt whether they were doing the same thing or were slightly different. All of them would be the same function call. When the goals drifted apart, it would be easy to identify the points of evolution.
Thanks for your replies. They broadened my perspective.
Part of the problem in my experience is expecting programmers to work solo, only interacting with other programmers during standup meetings or when they are asking for help.
I'm not sure how I feel specifically about pair programming, but I do think that even "local" code changes should be accompanied by at least some architectural discussion with other developers who work on that part of the system. Usually the discussion can be as simple as "does XYZ design seem like a good idea? yeah, go ahead". But in my opinion it's important to encourage, incorporate, and expect collaboration as part of the basic workflow.
Perhaps this is also part of the problem with code reviews. Reviewing code is kind of difficult. But it's a lot easier if you've already discussed the design beforehand with the person whose code you're reviewing, so at least you already know why they did what they did, and you aren't going to be surprised, and then have to spend time writing up your disagreement and entering a back-and-forth process that sucks up time.
Pair programming is really hard to make work well, but when it does, I’ve rarely had more fun coding. I’ve had a couple of amazing experiences with pairing, and it’s made me crave repeating the experience. Unfortunately, it hasn’t worked most of the time.
But I think you’re right - I think what you’re saying is code reviews should start before the change is being proposed for commit - it should be an architecturally collaborative process at some level. Code should be guided by reviewers in a positive feedback loop rather than left to critique at the last second. Reviewing code well is difficult, and also important to take seriously.
Hehe, I like that quote! Yeah I tend to lean towards duplication too when it’s a choice between duplication versus introducing some new abstraction. My personal rule is that I need to have 3 instances of working duplicate code before abstracting. Abstracting before having more than 1 real use case is nearly always the wrong thing to do, the wrong abstraction will be chosen relative to the future needs. (And this is the most common abstraction accident in my experience.) Two use cases still isn’t enough to warrant changing and complicating and interface, is often premature, and one duplicate with minor changes won’t do much damage. As soon as three real uses cases appear to be near-duplicates, it starts to become more clear what the abstraction should be, and it’s justifiable to consolidate at the cost of a little added complexity.
> Often these differences don’t manifest until more code accumulates and the goals begin to drift further apart. By the time it becomes clear, the dependency is harder to undo.
I think correct abstractions reduce dependency issues.
It’s easy to forget how much most of us have advanced.
20 years ago I got hired to on CNC machines.
The torch on/off code was 50 lines long and cut and pasted in hundreds of places.
It blew their minds when I explained what a function was.
TorchOn()/TorchOff() completely blew their minds.
This was for a highly successful product.
I don't it's fair to characterise this as an "argument against deduplication". It is rather an argument against using deduplication as an excuse for having complex solutions.
The key is how good the abstraction is. This has technical and social aspects.
Technically, the issue is if you can think about the abstraction without worrying over its implementation i.e. can the hidden remain hidden? All abstraction leak; perfection is impossible here (consider computational arithmetic addition), but it can be pretty good for a specific task at hand.
Socially, abstractions need to be known, learnt, understood, so it's better to directly use concepts that are already known by users, present future and potential (here, users as developers).
That can definitely be a problem. But clearly you haven't worked with a codebase that decided that the best way to implement two (let's be honest: two if you're lucky, it could easily be 10) similar components is to duplicate the entire few thousands LoC. And then, when similar changes need to be implemented in both: to implement them in subtly different ways.
Organizations trying to avoid duplication is the source of so many big corp ills. "Oh no. We can't have two people paying separately for the same $10 a month SaaS app. Better make a painful, time expensive procurement process for people who want to use tools"
If the waste in your software development process is basically on topic software development activity, you’re in pretty good shape already.
In my world it’s things like meeting load, calendar fragmentation, 3+ interviews a week, distributed sites, lurching from planning cycle overhead to perf/promo cycle overhead and back again, promotion oriented development, constant deprecation and migration churn among perfectly good dependencies, shitty oncall loads, etc. When you get an afternoon free to actually address the software development tasks ahead of you, that itself is cause for celebration, no matter whether they are the optimal tasks to do or done in the optimal way.
Nice at a high level but I have some gripes about the details:
1. It says "asynchronous communication" is categorically a cause of ineffective communication, but only "inefficient" meetings are a cause of ineffective communication. Glaring old-fogey double standard.
2. I was curious what "backlog inversion" is and found a definition here: http://sedano.org/software-development-wastes/mismanaging-th.... Apparently it's when you work on things that haven't been prioritized. No idea why they thought the name "backlog inversion", reminiscent of obscure OOP jargon, made sense, but real phenomenon no doubt.
Haven't been prioritized recently, not "not prioritized". A big difference. The inversion is (I think it's a strange use, but understand it) that people will work on something that looks like it's high priority (because it's marked that way) when it's low priority or deprecated.
The pattern I see repeatedly is when a solution is built for some problem, and then that "solution" breaks somehow, and then begins the iteration of solving the solution vs rethinking how to solve the original problem.
I like the Lean Software books by Mary and Tom Poppendieck. There are a few of them, they cover similar ground to each other but from different perspectives (talking about what it is, versus how to apply it as a manager, etc.). "Waste" is a critical concept to Lean in general and they've done a good job of applying it to software. All their books (I think) are available through O'Reilly.
I would recommend this free chapter from the book Rethinking Productivity in Software Engineering https://www.researchgate.net/publication/332916053_Removing_... which describes a workshop to identify and remove wastes. The whole book is open and fun to read.
Seems to be a priority management issue in the backlog. Low priority items becoming high priority, high priority becoming low priority, but neither being updated to reflect that. So team members who check the backlog for work take on tasks that look important, but aren't (maybe even deprecated).
As humans, our natural pattern matching abilities sometimes causes us to see things that aren't really there. It's a blind spot for us.
In this particular case, you now have research that backs up our understanding of software development waste. If you read the paper, you'll see examples of how our previous understanding of software development waste was flawed: https://www.researchgate.net/publication/313360479_Software_...
It might be "common sense", but that doesn't even mean that it's not possible for competent and well-meaning people to fall into these traps when working in larger groups.
Many things are much easier said than done, "create good architecture", "eliminate technical debt" etc. You can't just schedule X hours and then be done after X hours, and often the time you spent just made things worse. People have to realize that writing good code is an artform and not something you can just decide to do.
That, e.g., doing the same thing over wastes time is obvious. I don't need a taxonomy to tell me that. In my experience the cause of those things "happening" is that most people don't put much thought in what they are doing and why they are doing it. Programmers feel an "itch" to start coding when its not even clear what should be done. A gamble on one's intuition rather than a structured working procedure.
That's funny, I usually find a common source of unnecessary complexity is people trying way too hard to de-duplicate everything. Which has a tendency to causes abstractions that are more cognitively and technically expensive than the code they replaced... and once enough layers accreted, I'm pretty sure they take up more space.