I don't know who to give credit for this concept, but I'm a fan of "tombstoning" functions that are being removed. You basically replace (or prepend) the body of the function with a call to a method that logs the timestamp, calling function, etc.
I once had to lead a large refactor of a project written in an foreign dynamically typed language and tombstoning was a necessity for how large and poorly designed the projet was.
I've used tombstoning as well. The biggest problem is that you have to let it run long enough to determine if anyone is actually using that method - and with sections of an application that are not oft-used, you could be waiting a very long time to find out.
Well, that depends. If you have 100% coverage, you don't have dead code. But just because code is tested doesn't mean that it's actually in use outside of the test suite.
There's lots of ways that code could become dead, even if it's still accessible via a test suite.
I may be being stupid here, but shouldn't you really be using a comprehensive test suite and a code coverage tool like simplecov [0] to automatically show you your dead code? Or if you don't do test coverage (!) something like coverband [1] in production? That way you get to see straight away which code is genuinely dead rather than having to go spelunking with `git-bisect`?
I don't think it would work as well as we'd hope. Our projects aim for ever-increasing code coverage; it seems silly, esp as we get closer to 100%, but it seems to be pertinent here.
If I were to refactor some function to no longer be used, one would hope/expect that code coverage for that function would be missing, and thus we'd realize (or at least be able to notice more easily) that it was dead code.
However, unless your code is all tested via end-to-end tests of features and capabilities, it's _very_ possible that I (or a teammate) wrote some test of our dead helper function, verifying that it handles its edge cases correctly, as a unit test. In that case, we'd still see our code as "covered".
I think we could work around that by making the function deliberately raise an exception, and see which tests break, or commenting out the unit test of that function and see if it's still used. If the code is being exercised in any way other than its own unit tests, it ought to raise an error (or show coverage, depending on tactic). However, this requires us to _already_ suspect that function is dead.
I think the poster was saying that proper unit testing would give false positives.
Example: If I have a "dead" add() function with a unit test that asserts add(2,2) == 4, then code coverage would report that function being covered even if no other part of the codebase uses it anymore.
That's exactly what I was trying to say -- thanks!
I still _want_ that kind of test coverage, esp when there's weird edge cases, but when looking for dead code, it's probably good to look at only the acceptance / integration tests (or whatever you want to call the tests that do the holistic "I click here and everything works" tests).
100% coverage in testing doesn't indicate 100% usage in production. For example - if an API endpoint is no longer in use (say, it's a v1, and your company has now entirely moved to v4), the code in question will still be covered by testing - it's not dead according to the test suite.
Static analysis is your friend here. TypeScript, Rust, and C# can all identify dead code and report errors or warnings from that, and can perform "find all usage" searches across a workspace.
This is my main criticism of the Ruby community. Code that will throw static analysis out is not only accepted, but even welcome.
Python's cold acceptance is already enough to make big projects impractical on it, because every large codebase will gather some undecidable metaprogramming spread through it. But Ruby gets it everywhere.
I think he means that you seem to be speaking for all of the Ruby community (that is, people who write Ruby code). I've personally seen that not all of the Ruby community writes tests, because I've seen Ruby code that doesn't have tests. In fact, I've had coworkers who work with Ruby in multiple jobs in different industries, and in all cases the Ruby application had very few or no tests.
If your code base is only used within a single workspace (and you know this with certainty) then I completely agree. The issue arises when your code is used outside of your workspace as a shared code base or a public facing API.
This is an excellent point, but it should be noted that once you delete the unused API endpoint, finding the rest of the dead code is almost trivial with static analysis.
PHPStorm/IntelliJ has this feature where it marks any unused functions as gray. If you want to find where used functions are used, simply hit Ctrl-G and it'll show you all the places.
Well, privates that are unused or only used by something like
$method = 'get' . $command;
$this->$method;
are marked in grey. Which means some methods that you need to keep are grey. Not necessarily helpful. And other methods you want to deleted are not grey.
(The solution to that is "write code that doesn't suck", but alas, dead code tends to be a bigger problem in code that you didn't write.)
But as others have said, the unused code that keeps me busy is the stuff that says:
Sometimes it turns out that everyone uses it in the non-default state. Then there's the cases where everyone wants to use it in the non-default state, they just don't know the option can be toggled.
At least in PHP it depends on making sure the rest of your code-ecosystem has good annotations/type-hints. Especially if you have any code doing stuff like:
Neat idea, might look into this for .NET, the obsolete attribute doesn't really cut it currently. Some of the comments on here mention unit test, static analysis, and just a good all round IDE makes this a non issue, but those are the scenarios I find most dangerous.
A unit test does not prove without a doubt a class/method are not in use. Static analysis/good IDEs are bound to the immediate code base, they can't determine usage when your code is shared across multiple workspaces or when the class/method are only called externally.
Tombstoning isn't perfect, but its safer than what I typically do given a sufficient amount of time to collect logs along side a bit of analysis on a case by case basis (e.g. method AddTwoNumbers() not being called for 6 months maybe okay to remove, whereas EndOfYearReportGenerator() not being called for 6 months shouldn't blindly be removed.)
People are saying "shouldn't tests/code coverage fix this" or throwing around "static analysis" like it solves this issue.
Yes, static analysis is useful for finding internally orphaned code (though a simple search and seeing if the references outside of tests are 1 is just as effective) but that's not all dead code. The article is just using that as a simple example.
If you have a feature that has non-orphaned methods your testing may cover it and your static analysis will say it's good but if it is never used then the code supporting it is effectively dead.
This code is dead weight to your code base. It's costing you time and effort to maintain or work around. Having the ability to track it down and remove it is important.
I think the other problem is that static analysis is a great tool for a new code base. But a new code base has no dead code.
Or to put it another way, when you're designing a system, please think about the benefit static analysis will have in identifying dead code three years hence. But when you're trying to find dead code in that ten year old PHP code no amount of woulda's and coulda's will get rid of it.
The intelligent person solves the problem they've got. The unintelligent person shoulda's coulda's and woulda's you into oblivion.
(NB. I would probably choose a tool with a good type system, and I would try to write to its strengths, in any system I was designing today. But that still doesn't help me identify the dead code I've got.)
Because Unused can't guarantee non-use (hooray metaprogramming and inheritance chains) it can't guarantee accuracy, but it gets close. `git bisect` and a good test suite is your friend (it also identifies methods/functions that are ONLY ever tested and don't look to be used anywhere else).
The underlying premise is that, by leveraging ctags and the ability to search for the presence of tokens within a directory, it can estimate what's used based on occurrence frequency and location. Because of this, it's language-agnostic.
Biggest removal I've worked on is ~1500 LOC (a large chunk being JSON); however, I've seen the results from some of our client work at thoughtbot, which have surpassed 3k-4kLOC.
What is the point of finding the commit that removed the usage of your dead code? You should only care about the state of master. There is no point trying to prove where the removal happend.
Bisect is great to find the commit that caused a bug, because the commit narrows down the code to inspect to fix the bug.
Maybe I'm missing something here, but git bisect doesn't seem to apply in this case.
Generally you'll want to understand why the code is dead before removing. Otherwise it might've been left for a good reason. Chesterton's fence[1] and all that.
Git bisect also has a test command where you can automate grepping for usage. You can use something like wc to determine whether you matched one or more than one result. This automates the bisect and let's you take a nice break from the computer while it does it's magic.
> Removing dead code from your codebase is a simple process at its core: find the dead parts, prove that they’re dead, and delete them. The tricky part is in that second step, but after a little trial and error, we discovered a couple tricks to speed up the process.
The tricky part shouldn't be the second step. The second step shouldn't even exist. How is it that in this day and age, your compiler can't tell you definitively if a block of code is dead?
You can prove that code is referred to by other code in the same project. However, you can't prove if your exposed API is being used. For example, let's say you have an HTTP API endpoint and you're not sure if your front end is using it.
Then my first step would be to check the logs, or start logging if there are none currently.
You get a nice structured logging format set up, log every request, and feed that into something you can graph and query. Leave that on long enough (long enough being a length of time that depends on what the endpoint does and how critical it is, as well as how much traffic you see regularly) and you can figure out what's being used pretty quickly.
Finding all dead code would mean solving the halting problem, even in the simplified case when you're not writing a library whose consumers may no longer use an exposed function.
Now we're discussing the meaning of "dead code". I'm pretty certain that the OP was referring to methods that have no callsites. That doesn't require much in the way of proof, the git bisect seems to be more about identifying when it became dead. Now if you have methods that have existing callsites, but aren't being called, then yes, you may need to solve the halting problem to prove that no basis path that is actually followed calls the code in question.
Of course in a dynamic language, or one with reflection, then the distinctions are blurry
Reading through the discussion here: most of the comments are assuming interpreted languages. The article itself was talking about Ruby.
And I'd imagine that in any form of client-server system, especially if there are multiple client implementations, it would be hard to tell if specific server endpoints are definitively "dead".
Here's an example in PHP from a quick google:
https://github.com/scheb/tombstone
I once had to lead a large refactor of a project written in an foreign dynamically typed language and tombstoning was a necessity for how large and poorly designed the projet was.