Hard agree. And I used to belong to the other camp.
The basic tension here is between locality [0], on the one hand, and the desire to clearly show the high-level "table of contents" view on the other. Locality is more important for readable code. As the article notes, the TOC view can be made clear enough with section comments.
There is another, even more important, reason to prefer the linear code: It is much easier to navigate a codebase writ large when the "chunks" (functions / classes / whatever your language mandates) roughly correspond to business use-cases. Otherwise your search space gets too big, and you have to "reconstruct" the whole from the pieces yourself. The code's structure should do that for you.
If a bunch of "stuff" is all related to one thing (signup, or purchase, or whatever), let it be one thing in the code. It will be much easier to find and change things. Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization.
I went the opposite direction: I used to be in the linear code camp, and now I'm in the "more functions" camp.
For me the biggest reason is state. The longer the function, the wider the scope of the local variables. Any code anywhere in the function can mutate any of the variables, and it's not immediately clear what the data flow is. More functions help scopes stay small, and data flow is more explicit.
A side benefit is that "more functions" helps keep indentation down.
At the same time, I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.
> Any code anywhere in the function can mutate any of the variables
Regardless of the language I'm using, I never mutate values. Counters in loops or some other hyper-local variables (for performance) might be the inconsequential exceptions to this rule.
> More functions help scopes stay small, and data flow is more explicit.
Just write your big function with local scope sections, if needed (another local exception to the rule above). Eg, in JS:
let sectionReturnVal
{
// stuff that sets sectionReturnVal
}
or even use IIFE to return the value and then you can use a const. "A function, you're cheating!" you might say, but my goal is not to avoid a particular language construct, but to maintain locality, and avoid unnecessary names and jumping around.
> A side benefit is that "more functions" helps keep indentation down.
It is also worth noting that solving this problem with function extraction can often be a merely aesthetic improvement. That is, you will still need to keep hold the surrounding context (if not the state) in your head when reading the function to understand the whole picture, and the extraction makes that harder.
Using early returns correctly, by contrast, can actually alleviate working memory issues, since you can dismiss everything above as "handling validation and errors". That is, even though technically, no matter what you do, you are spidering down the branches of control flow, and therefore in some very specific context, the code organization can affect how much attention you need to pay to that context.
> I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.
Precisely, just take this thinking to its logical conclusion. You can (mostly) have your cake and eat it too.
The better solution to this is to use nested functions that are immediately called, rather than top level functions. That lets you cordon off chunks of state while still keeping a linear order of definition and execution. And you don't have to worry about inadvertently increasing your API maintenance burden because people started to depend on those top level functions later.
> Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization
What about for testing? What about for reducing state you need to keep in mind? What about releasing resources? What about understanding the impact of a change? Etc.
Consider an end of day process with 10 non-reusable steps that must run in order and each step is 100 lines. Each step uses similar data to the step before it so variables are similar but not the same. You would really choose a 1000 line single function?
For "use-case" code like this with many steps, you are typically testing how things wire together, and so will either be injecting mocks to unit test, in which case it is not a problem, or wanting to integration or e2e test, in which case it is also not a problem.
If complex, purely logical computation is part of the larger function, and you can pull that part out into a pure function which can be easily unit tested without mocks, that is indeed a valid factoring which I support, and an exception to the general rule.
> What about for reducing state you need to keep in mind?
Typically not a problem because if the function corresponds to a business use-case, you and everybody else is already thinking about it as "one thing".
> What about releasing resources?
Not a problem I have ever once run into with backend programming in garbage collected languages. Obviously if you are in a different situation, YMMV.
> Consider an end of day process with 30 non-reusable steps that must run in order and each step is 100 lines.
I would use my judgement and might break it down. Again, I have never encountered such a situation in many years of programming.
You seem to be trying to find the (ime) rare exceptions as if those disprove the general rule. But in practice the "explode your holistic function unnecessarily into 10 parts" is a much more common error than taking "don't break it down" too far.
let DebugFlags = {StepOne=false, StepTwo=false, StepThree=true};
if (DebugFlags.StepOne) { ... }
if (DebugFlags.StepTwo) { ... }
if (DebugFlags.StepThree) { ... }
Your training in structured, DRY and OOP will recoil at this: More branches! Impossible. But your spec says "must run in order". It does this by design. Every resource can be tracked by reading it top to bottom, and the only way in which you can miss it is through a loop, which you can also aim to minimize usage of. The spec also says "uses similar data to the step before it". If variables are similar-not-same, enclose them in curly braces so that you get some scope guarding. The debug flags contain the information needed to generate whatever test data is necessary. They can alternately be organized as enumerated state instead of booleans: {All, TestOne, TestTwo, TestThree}.
Long, bespoke linear sequences can be hairy, but the tools to deal with them are present in current production languages without atomizing the code into tiny functions. Occasionally you can find a useful pattern that does call for a new function, and do a "harvest" on the code and get its size down. But you have to be patient with it before you have a good sense of where a new parameterized function gets the right effect, and where inlining and flagging an existing one will do better.
The basic tension here is between locality [0], on the one hand, and the desire to clearly show the high-level "table of contents" view on the other. Locality is more important for readable code. As the article notes, the TOC view can be made clear enough with section comments.
There is another, even more important, reason to prefer the linear code: It is much easier to navigate a codebase writ large when the "chunks" (functions / classes / whatever your language mandates) roughly correspond to business use-cases. Otherwise your search space gets too big, and you have to "reconstruct" the whole from the pieces yourself. The code's structure should do that for you.
If a bunch of "stuff" is all related to one thing (signup, or purchase, or whatever), let it be one thing in the code. It will be much easier to find and change things. Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization.
[0] https://htmx.org/essays/locality-of-behaviour/