Burying your dead code

slap_shot · on Sept 7, 2017

I don't know who to give credit for this concept, but I'm a fan of "tombstoning" functions that are being removed. You basically replace (or prepend) the body of the function with a call to a method that logs the timestamp, calling function, etc.

Here's an example in PHP from a quick google:

https://github.com/scheb/tombstone

I once had to lead a large refactor of a project written in an foreign dynamically typed language and tombstoning was a necessity for how large and poorly designed the projet was.

dpcx · on Sept 7, 2017

I've used tombstoning as well. The biggest problem is that you have to let it run long enough to determine if anyone is actually using that method - and with sections of an application that are not oft-used, you could be waiting a very long time to find out.

dhimes · on Sept 7, 2017

Wouldn't a test suite help with that?

falcolas · on Sept 7, 2017

Well, that depends. If you have 100% coverage, you don't have dead code. But just because code is tested doesn't mean that it's actually in use outside of the test suite.

There's lots of ways that code could become dead, even if it's still accessible via a test suite.

dhimes · on Sept 7, 2017

Right. But it seems it might help with the tombstoning if the tests hit the calls (rather than waiting until it happens by serendipity).

Can_Not · on Sept 7, 2017

If the original developers wrote one, it might help.

ro_sharp · on Sept 8, 2017

Another bloody good argument against using such languages for large projects.

boffinism · on Sept 7, 2017

I may be being stupid here, but shouldn't you really be using a comprehensive test suite and a code coverage tool like simplecov [0] to automatically show you your dead code? Or if you don't do test coverage (!) something like coverband [1] in production? That way you get to see straight away which code is genuinely dead rather than having to go spelunking with `git-bisect`?

[0] https://github.com/colszowka/simplecov [1] https://github.com/danmayer/coverband

gknoy · on Sept 7, 2017

I don't think it would work as well as we'd hope. Our projects aim for ever-increasing code coverage; it seems silly, esp as we get closer to 100%, but it seems to be pertinent here.

If I were to refactor some function to no longer be used, one would hope/expect that code coverage for that function would be missing, and thus we'd realize (or at least be able to notice more easily) that it was dead code.

However, unless your code is all tested via end-to-end tests of features and capabilities, it's _very_ possible that I (or a teammate) wrote some test of our dead helper function, verifying that it handles its edge cases correctly, as a unit test. In that case, we'd still see our code as "covered".

I think we could work around that by making the function deliberately raise an exception, and see which tests break, or commenting out the unit test of that function and see if it's still used. If the code is being exercised in any way other than its own unit tests, it ought to raise an error (or show coverage, depending on tactic). However, this requires us to _already_ suspect that function is dead.

chris_wot · on Sept 7, 2017

If there was sufficient code coverage then the test would hopefully invoke a function that eventually calls on your potentially dead function.

eduren · on Sept 7, 2017

I think the poster was saying that proper unit testing would give false positives.

Example: If I have a "dead" add() function with a unit test that asserts add(2,2) == 4, then code coverage would report that function being covered even if no other part of the codebase uses it anymore.

gknoy · on Sept 12, 2017

That's exactly what I was trying to say -- thanks!

I still _want_ that kind of test coverage, esp when there's weird edge cases, but when looking for dead code, it's probably good to look at only the acceptance / integration tests (or whatever you want to call the tests that do the holistic "I click here and everything works" tests).

falcolas · on Sept 7, 2017

100% coverage in testing doesn't indicate 100% usage in production. For example - if an API endpoint is no longer in use (say, it's a v1, and your company has now entirely moved to v4), the code in question will still be covered by testing - it's not dead according to the test suite.

tcopeland · on Sept 7, 2017

Yup but coverband doesn't do code coverage during tests (like simplecov); it does code coverage in production by sampling requests and profiling them.

dom0 · on Sept 7, 2017

You seem to assume that a comprehensive test suite exists...

tdriggs · on Sept 7, 2017

Static analysis is your friend here. TypeScript, Rust, and C# can all identify dead code and report errors or warnings from that, and can perform "find all usage" searches across a workspace.

marcosdumay · on Sept 7, 2017

This is my main criticism of the Ruby community. Code that will throw static analysis out is not only accepted, but even welcome.

Python's cold acceptance is already enough to make big projects impractical on it, because every large codebase will gather some undecidable metaprogramming spread through it. But Ruby gets it everywhere.

bga · on Sept 7, 2017

And so in Ruby, we use test frameworks with advanced mocking, as well as runtime-reflection coverage measurement.

Metaprogramming: the cause of, and solution to, all of life's problems.

jimktrains2 · on Sept 7, 2017

> And so in Ruby, we use test frameworks with advanced mocking, as well as runtime-reflection coverage measurement.

No matter how many times you say it, that doesn't make it true.

bga · on Sept 7, 2017

Care to elaborate?

Buttons840 · on Sept 7, 2017

I think he means that you seem to be speaking for all of the Ruby community (that is, people who write Ruby code). I've personally seen that not all of the Ruby community writes tests, because I've seen Ruby code that doesn't have tests. In fact, I've had coworkers who work with Ruby in multiple jobs in different industries, and in all cases the Ruby application had very few or no tests.

cl0wnshoes · on Sept 7, 2017

If your code base is only used within a single workspace (and you know this with certainty) then I completely agree. The issue arises when your code is used outside of your workspace as a shared code base or a public facing API.

msangi · on Sept 7, 2017

Static analysis is very useful to identify unreachable code given that it can see all the code depending on it.

It doesn't help with potentially reachable but unused cpdepaths, which in my experience make up for the most of 'dead' code.

Think of the functionality in an application that nobody uses or of obsolete API version when all the clients use a more recent version.

lomnakkus · on Sept 8, 2017

This is an excellent point, but it should be noted that once you delete the unused API endpoint, finding the rest of the dead code is almost trivial with static analysis.

30-15 to static analysis, I think? ;)

pascalxus · on Sept 7, 2017

PHPStorm/IntelliJ has this feature where it marks any unused functions as gray. If you want to find where used functions are used, simply hit Ctrl-G and it'll show you all the places.

cassowary · on Sept 8, 2017

Well, privates that are unused or only used by something like

    $method = 'get' . $command;
    $this->$method;

are marked in grey. Which means some methods that you need to keep are grey. Not necessarily helpful. And other methods you want to deleted are not grey.

(The solution to that is "write code that doesn't suck", but alas, dead code tends to be a bigger problem in code that you didn't write.)

But as others have said, the unused code that keeps me busy is the stuff that says:

    if($this->option->isTurnedOn()) {
        $this->doStuff();
    } else {
        $this->doThat();
    }

Sometimes it turns out that everyone uses it in the non-default state. Then there's the cases where everyone wants to use it in the non-default state, they just don't know the option can be toggled.

Terr_ · on Sept 7, 2017

At least in PHP it depends on making sure the rest of your code-ecosystem has good annotations/type-hints. Especially if you have any code doing stuff like:

    return new $className();

cl0wnshoes · on Sept 7, 2017

Neat idea, might look into this for .NET, the obsolete attribute doesn't really cut it currently. Some of the comments on here mention unit test, static analysis, and just a good all round IDE makes this a non issue, but those are the scenarios I find most dangerous.

A unit test does not prove without a doubt a class/method are not in use. Static analysis/good IDEs are bound to the immediate code base, they can't determine usage when your code is shared across multiple workspaces or when the class/method are only called externally.

Tombstoning isn't perfect, but its safer than what I typically do given a sufficient amount of time to collect logs along side a bit of analysis on a case by case basis (e.g. method AddTwoNumbers() not being called for 6 months maybe okay to remove, whereas EndOfYearReportGenerator() not being called for 6 months shouldn't blindly be removed.)

abritinthebay · on Sept 7, 2017

People are saying "shouldn't tests/code coverage fix this" or throwing around "static analysis" like it solves this issue.

Yes, static analysis is useful for finding internally orphaned code (though a simple search and seeing if the references outside of tests are 1 is just as effective) but that's not all dead code. The article is just using that as a simple example.

If you have a feature that has non-orphaned methods your testing may cover it and your static analysis will say it's good but if it is never used then the code supporting it is effectively dead.

This code is dead weight to your code base. It's costing you time and effort to maintain or work around. Having the ability to track it down and remove it is important.

This is where other methods come in.

cassowary · on Sept 8, 2017

I think the other problem is that static analysis is a great tool for a new code base. But a new code base has no dead code.

Or to put it another way, when you're designing a system, please think about the benefit static analysis will have in identifying dead code three years hence. But when you're trying to find dead code in that ten year old PHP code no amount of woulda's and coulda's will get rid of it.

The intelligent person solves the problem they've got. The unintelligent person shoulda's coulda's and woulda's you into oblivion.

(NB. I would probably choose a tool with a good type system, and I would try to write to its strengths, in any system I was designing today. But that still doesn't help me identify the dead code I've got.)

jayroh · on Sept 7, 2017

There shouldn't be discussion around this topic without mentioning Unused:

https://unused.codes

I've run that through many projects over the last year and it's been pretty instrumental in finding the low-hanging stuff easy to pick off.

joshuadclayton · on Sept 7, 2017

Because Unused can't guarantee non-use (hooray metaprogramming and inheritance chains) it can't guarantee accuracy, but it gets close. `git bisect` and a good test suite is your friend (it also identifies methods/functions that are ONLY ever tested and don't look to be used anywhere else).

The underlying premise is that, by leveraging ctags and the ability to search for the presence of tokens within a directory, it can estimate what's used based on occurrence frequency and location. Because of this, it's language-agnostic.

Biggest removal I've worked on is ~1500 LOC (a large chunk being JSON); however, I've seen the results from some of our client work at thoughtbot, which have surpassed 3k-4kLOC.

cmcginty · on Sept 8, 2017

What is the point of finding the commit that removed the usage of your dead code? You should only care about the state of master. There is no point trying to prove where the removal happend.

Bisect is great to find the commit that caused a bug, because the commit narrows down the code to inspect to fix the bug.

Maybe I'm missing something here, but git bisect doesn't seem to apply in this case.

Doxin · on Sept 8, 2017

Generally you'll want to understand why the code is dead before removing. Otherwise it might've been left for a good reason. Chesterton's fence[1] and all that.

[1]https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence

thehoagie · on Sept 7, 2017

Git bisect also has a test command where you can automate grepping for usage. You can use something like wc to determine whether you matched one or more than one result. This automates the bisect and let's you take a nice break from the computer while it does it's magic.

bradmwalker · on Sept 7, 2017

Git log -p -S <function name>

chris_wot · on Sept 7, 2017

If you use a regex then try:

  git log -p -G regex

darksaints · on Sept 7, 2017

> Removing dead code from your codebase is a simple process at its core: find the dead parts, prove that they’re dead, and delete them. The tricky part is in that second step, but after a little trial and error, we discovered a couple tricks to speed up the process.

The tricky part shouldn't be the second step. The second step shouldn't even exist. How is it that in this day and age, your compiler can't tell you definitively if a block of code is dead?

choward · on Sept 7, 2017

You can prove that code is referred to by other code in the same project. However, you can't prove if your exposed API is being used. For example, let's say you have an HTTP API endpoint and you're not sure if your front end is using it.

reificator · on Sept 7, 2017

Then my first step would be to check the logs, or start logging if there are none currently.

You get a nice structured logging format set up, log every request, and feed that into something you can graph and query. Leave that on long enough (long enough being a length of time that depends on what the endpoint does and how critical it is, as well as how much traffic you see regularly) and you can figure out what's being used pretty quickly.

adrianN · on Sept 7, 2017

Finding all dead code would mean solving the halting problem, even in the simplified case when you're not writing a library whose consumers may no longer use an exposed function.

pacaro · on Sept 7, 2017

Now we're discussing the meaning of "dead code". I'm pretty certain that the OP was referring to methods that have no callsites. That doesn't require much in the way of proof, the git bisect seems to be more about identifying when it became dead. Now if you have methods that have existing callsites, but aren't being called, then yes, you may need to solve the halting problem to prove that no basis path that is actually followed calls the code in question.

Of course in a dynamic language, or one with reflection, then the distinctions are blurry

khedoros1 · on Sept 7, 2017

Reading through the discussion here: most of the comments are assuming interpreted languages. The article itself was talking about Ruby.

And I'd imagine that in any form of client-server system, especially if there are multiple client implementations, it would be hard to tell if specific server endpoints are definitively "dead".

reificator · on Sept 7, 2017

If you don't see any logged requests to that endpoint for a certain period of time, you can be pretty sure of its status.

If you don't have that detail in your logs, that's step 0.

khedoros1 · on Sept 7, 2017

"Pretty sure" stinks, if you're coming from a world where the compiler can tell you "this code never gets called by anything".

reificator · on Sept 8, 2017

When you're making a library, the compiler can't do anything of the sort.